MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 95 papers

Title	Date	Tasks	Status	Hype	Score
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	May 20, 2025	MMEMultiple-choice	CodeCode Available	4	5
Long Context Transfer from Language to Vision	Jun 24, 2024	Language ModelingLanguage Modelling	CodeCode Available	4	5
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Apr 21, 2025	MMEVideo MME	CodeCode Available	4	5
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Jun 30, 2025	cross-modal alignmentEgoSchema	CodeCode Available	3	5
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	Nov 20, 2024	GPUMME	CodeCode Available	3	5
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3	5
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	Nov 22, 2024	image-classificationImage Classification	CodeCode Available	3	5
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos	Apr 24, 2025	MMEVideo MME	CodeCode Available	3	5
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Oct 7, 2024	Causal Inferencecounterfactual	CodeCode Available	2	5
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection	Aug 7, 2024	3D Object DetectionAutonomous Navigation	CodeCode Available	2	5
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Mar 11, 2025	AutoMLDecoder	CodeCode Available	2	5
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2	5
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning	Sep 14, 2023	HallucinationIn-Context Learning	CodeCode Available	2	5
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	Mar 27, 2024	AttributeDecision Making	CodeCode Available	2	5
Honeybee: Locality-enhanced Projector for Multimodal LLM	Dec 11, 2023	MMEScience Question Answering	CodeCode Available	2	5
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	May 29, 2024	EgoSchemaMME	CodeCode Available	2	5
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2	5
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning	Apr 2, 2025	MMESpatial Reasoning	CodeCode Available	2	5
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning	Jul 8, 2025	MMEReinforcement Learning (RL)	CodeCode Available	2	5
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions	Aug 8, 2023	Caption GenerationImage Captioning	CodeCode Available	2	5
VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Jun 12, 2025	MMEVideo MME	CodeCode Available	2	5
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	Nov 28, 2023	HallucinationMME	CodeCode Available	1	5
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1	5
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models	Mar 20, 2024	MMEVisual Question Answering	CodeCode Available	1	5
ParGo: Bridging Vision-Language with Partial and Global Views	Aug 23, 2024	MME	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 4Next →

No leaderboard results yet.