SOTAVerified|Agents Browse Leaderboard About Blog

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 95 papers

Title	Date	Tasks	Status	Hype
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping	Oct 11, 2024	MMEQuestion Answering	CodeCode Available	1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding	Mar 27, 2025	FormLanguage Modeling	CodeCode Available	1
Semi-supervised Domain Adaptation via Minimax Entropy	Apr 13, 2019	Domain AdaptationMME	CodeCode Available	1
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models	Mar 20, 2024	MMEVisual Question Answering	CodeCode Available	1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	Nov 28, 2023	HallucinationMME	CodeCode Available	1
Towards Text-Image Interleaved Retrieval	Feb 18, 2025	Information RetrievalLanguage Modeling	CodeCode Available	1
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	May 31, 2024	MMEVideo MME	CodeCode Available	1
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning	Nov 2, 2023	MMEVisual Reasoning	CodeCode Available	1
ParGo: Bridging Vision-Language with Partial and Global Views	Aug 23, 2024	MME	CodeCode Available	1
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Mar 21, 2024	HallucinationMME	CodeCode Available	1
Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation	Apr 9, 2020	Domain AdaptationMeta-Learning	—Unverified	0
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads	Nov 28, 2024	GPULanguage Modeling	—Unverified	0
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Jun 27, 2025	MMEVideo MME	—Unverified	0
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models	May 28, 2024	HallucinationMME	—Unverified	0
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Nov 25, 2024	Large Language ModelMME	—Unverified	0
Scalable K-Medoids via True Error Bound and Familywise Bandits	May 27, 2019	ClusteringMME	—Unverified	0
Silkie: Preference Distillation for Large Visual Language Models	Dec 17, 2023	HallucinationMME	—Unverified	0
Temporal Preference Optimization for Long-Form Video Understanding	Jan 23, 2025	FormMME	—Unverified	0
Temporal Reasoning Transfer from Text to Video	Oct 8, 2024	DiagnosticMME	—Unverified	0
The Use of Symmetry for Models with Variable-size Variables	Nov 15, 2023	MME	—Unverified	0
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification	Oct 11, 2024	MMEQuantization	—Unverified	0
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise	Dec 19, 2023	MMEVisual Reasoning	—Unverified	0
AIDE: Agentically Improve Visual Language Model with Domain Experts	Feb 13, 2025	Knowledge DistillationLanguage Modeling	—Unverified	0
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes	Apr 21, 2025	MMEVideo MME	—Unverified	0

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.