SOTAVerified|Agents Browse Leaderboard About Blog

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 95 papers

Title	Date	Tasks	Status	Hype
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping	Oct 11, 2024	MMEQuestion Answering	CodeCode Available	1
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Oct 9, 2024	MME	CodeCode Available	1
ParGo: Bridging Vision-Language with Partial and Global Views	Aug 23, 2024	MME	CodeCode Available	1
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	May 31, 2024	MMEVideo MME	CodeCode Available	1
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Mar 21, 2024	HallucinationMME	CodeCode Available	1
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models	Mar 20, 2024	MMEVisual Question Answering	CodeCode Available	1
Prompt Highlighter: Interactive Control for Multi-Modal LLMs	Dec 7, 2023	MMEText Generation	CodeCode Available	1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	Nov 28, 2023	HallucinationMME	CodeCode Available	1
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning	Nov 2, 2023	MMEVisual Reasoning	CodeCode Available	1
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1
Semi-supervised Domain Adaptation via Minimax Entropy	Apr 13, 2019	Domain AdaptationMME	CodeCode Available	1
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Jun 27, 2025	MMEVideo MME	—Unverified	0
Language-Vision Planner and Executor for Text-to-Visual Reasoning	Jun 9, 2025	In-Context LearningMME	—Unverified	0
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding	Jun 4, 2025	MMEVideo MME	—Unverified	0
Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering	Jun 1, 2025	AllMME	—Unverified	0
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models	May 28, 2025	Mixture-of-ExpertsMME	—Unverified	0
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs	May 27, 2025	Logical ReasoningMME	—Unverified	0
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models	May 26, 2025	HallucinationMME	—Unverified	0
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models	May 18, 2025	HallucinationMME	—Unverified	0
VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization	May 16, 2025	cross-modal alignmentMME	—Unverified	0
Visual Instruction Tuning with Chain of Region-of-Interest	May 11, 2025	MME	—Unverified	0
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes	Apr 21, 2025	MMEVideo MME	—Unverified	0
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Apr 4, 2025	BenchmarkingImage Generation	—Unverified	0
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models	Mar 24, 2025	MMETextVQA	CodeCode Available	0
Improving LLM Video Understanding with 16 Frames Per Second	Mar 18, 2025	MMEVideo MME	—Unverified	0

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.