SOTAVerified

Multimodal Large Language Model

Papers

Showing 201225 of 347 papers

TitleStatusHype
Audio-Visual LLM for Video Understanding0
Automated radiotherapy treatment planning guided by GPT-4Vision0
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction0
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
BlueLM-2.5-3B Technical Report0
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
Can Multimodal Large Language Model Think Analogically?0
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models0
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion0
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model0
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images0
ChatGPT Meets Iris Biometrics0
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning0
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model0
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI0
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance0
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation0
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation0
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation0
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework0
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.