SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 2130 of 95 papers

TitleStatusHype
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object DetectionCode2
ParGo: Bridging Vision-Language with Partial and Global ViewsCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference OptimizationCode1
Pensieve: Retrospect-then-Compare Mitigates Visual HallucinationCode1
Masked Motion Encoding for Self-Supervised Video Representation LearningCode1
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language ModelsCode1
Prompt Highlighter: Interactive Control for Multi-Modal LLMsCode1
Semi-supervised Domain Adaptation via Minimax EntropyCode1
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingCode1
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.