SOTAVerified|Agents Browse Leaderboard About

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 95 papers

Title	Date	Tasks	Status	Hype	Score
ParGo: Bridging Vision-Language with Partial and Global Views	Aug 23, 2024	MME	CodeCode Available	1	5
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Mar 21, 2024	HallucinationMME	CodeCode Available	1	5
Prompt Highlighter: Interactive Control for Multi-Modal LLMs	Dec 7, 2023	MMEText Generation	CodeCode Available	1	5
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding	Mar 27, 2025	FormLanguage Modeling	CodeCode Available	1	5
Semi-supervised Domain Adaptation via Minimax Entropy	Apr 13, 2019	Domain AdaptationMME	CodeCode Available	1	5
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping	Oct 11, 2024	MMEQuestion Answering	CodeCode Available	1	5
SiLVR: A Simple Language-based Video Reasoning Framework	May 30, 2025	MathMME	CodeCode Available	1	5
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1	5
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Oct 9, 2024	MME	CodeCode Available	1	5
Towards Text-Image Interleaved Retrieval	Feb 18, 2025	Information RetrievalLanguage Modeling	CodeCode Available	1	5
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning	Nov 2, 2023	MMEVisual Reasoning	CodeCode Available	1	5
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution	Aug 15, 2022	Graph Neural NetworkGraph Representation Learning	CodeCode Available	0	5
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions	Oct 5, 2024	BenchmarkingHallucination	CodeCode Available	0	5
MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects	Jan 1, 2023	MMEObject	CodeCode Available	0	5
Re-Imagining Multimodal Instruction Tuning: A Representation View	Mar 2, 2025	Instruction FollowingMME	CodeCode Available	0	5
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models	Apr 6, 2024	MMEObject	CodeCode Available	0	5
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules	Dec 24, 2024	MMESensitivity	CodeCode Available	0	5
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition	Sep 26, 2023	ArticlesImage Comprehension	CodeCode Available	0	5
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions	Nov 21, 2023	DescriptiveMME	CodeCode Available	0	5
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models	Mar 24, 2025	MMETextVQA	CodeCode Available	0	5
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise	Dec 19, 2023	MMEVisual Reasoning	CodeCode Available	0	5
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment	Jul 21, 2024	MME	CodeCode Available	0	5
VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization	May 16, 2025	cross-modal alignmentMME	—Unverified	0	0
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification	Oct 11, 2024	MMEQuantization	—Unverified	0	0
AIDE: Agentically Improve Visual Language Model with Domain Experts	Feb 13, 2025	Knowledge DistillationLanguage Modeling	—Unverified	0	0

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.