SOTAVerified|Agents Browse Leaderboard About Blog

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 95 papers

Title	Date	Tasks	Status	Hype	Score
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Mar 21, 2024	HallucinationMME	CodeCode Available	1	5
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping	Oct 11, 2024	MMEQuestion Answering	CodeCode Available	1	5
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models	Mar 20, 2024	MMEVisual Question Answering	CodeCode Available	1	5
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1	5
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	May 31, 2024	MMEVideo MME	CodeCode Available	1	5
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning	Nov 2, 2023	MMEVisual Reasoning	CodeCode Available	1	5
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules	Dec 24, 2024	MMESensitivity	CodeCode Available	0	5
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions	Nov 21, 2023	DescriptiveMME	CodeCode Available	0	5
Re-Imagining Multimodal Instruction Tuning: A Representation View	Mar 2, 2025	Instruction FollowingMME	CodeCode Available	0	5
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution	Aug 15, 2022	Graph Neural NetworkGraph Representation Learning	CodeCode Available	0	5

Show:10 25 50

← PrevPage 4 of 10Next →

No leaderboard results yet.