SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 2130 of 95 papers

TitleStatusHype
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsCode2
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference OptimizationCode1
ParGo: Bridging Vision-Language with Partial and Global ViewsCode1
Masked Motion Encoding for Self-Supervised Video Representation LearningCode1
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language ModelsCode1
Prompt Highlighter: Interactive Control for Multi-Modal LLMsCode1
Pensieve: Retrospect-then-Compare Mitigates Visual HallucinationCode1
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language ModelsCode1
Semi-supervised Domain Adaptation via Minimax EntropyCode1
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.