SOTAVerified|Agents Browse Leaderboard About Blog

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 95 papers

Title	Date	Tasks	Status	Hype
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Mar 11, 2025	AutoMLDecoder	CodeCode Available	2
Re-Imagining Multimodal Instruction Tuning: A Representation View	Mar 2, 2025	Instruction FollowingMME	CodeCode Available	0
Ultra-High-Frequency Harmony: mmWave Radar and Event Camera Orchestrate Accurate Drone Landing	Feb 20, 2025	MMESensor Fusion	—Unverified	0
Towards Text-Image Interleaved Retrieval	Feb 18, 2025	Information RetrievalLanguage Modeling	CodeCode Available	1
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Feb 13, 2025	BenchmarkingMath	—Unverified	0
AIDE: Agentically Improve Visual Language Model with Domain Experts	Feb 13, 2025	Knowledge DistillationLanguage Modeling	—Unverified	0
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment	Feb 7, 2025	DiversityHuman-Object Interaction Detection	—Unverified	0
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding	Feb 3, 2025	AttributeMME	—Unverified	0
MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark	Jan 28, 2025	MMEModel Optimization	—Unverified	0
Temporal Preference Optimization for Long-Form Video Understanding	Jan 23, 2025	FormMME	—Unverified	0
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules	Dec 24, 2024	MMESensitivity	CodeCode Available	0
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors	Dec 19, 2024	MME	—Unverified	0
Apollo: An Exploration of Video Understanding in Large Multimodal Models	Dec 13, 2024	MMEVideo MME	—Unverified	0
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3
EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation	Dec 6, 2024	MMEQuestion Answering	—Unverified	0
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads	Nov 28, 2024	GPULanguage Modeling	—Unverified	0
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Nov 25, 2024	Large Language ModelMME	—Unverified	0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy	Nov 23, 2024	Instruction FollowingMME	—Unverified	0
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	Nov 22, 2024	image-classificationImage Classification	CodeCode Available	3
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	Nov 20, 2024	GPUMME	CodeCode Available	3
The economic value of empowering older patients transitioning from hospital to home: Evidence from the 'Your Care Needs You' intervention	Nov 7, 2024	MMESensitivity	—Unverified	0
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning	Nov 5, 2024	MMEQuestion Answering	—Unverified	0
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification	Oct 11, 2024	MMEQuantization	—Unverified	0
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping	Oct 11, 2024	MMEQuestion Answering	CodeCode Available	1
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Oct 9, 2024	MME	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.