SOTAVerified|Agents Browse Leaderboard About Blog

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 95 papers

Title	Date	Tasks	Status	Hype	Score
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2	5
Honeybee: Locality-enhanced Projector for Multimodal LLM	Dec 11, 2023	MMEScience Question Answering	CodeCode Available	2	5
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning	Jul 8, 2025	MMEReinforcement Learning (RL)	CodeCode Available	2	5
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning	Apr 2, 2025	MMESpatial Reasoning	CodeCode Available	2	5
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning	Sep 14, 2023	HallucinationIn-Context Learning	CodeCode Available	2	5
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions	Aug 8, 2023	Caption GenerationImage Captioning	CodeCode Available	2	5
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2	5
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	Mar 27, 2024	AttributeDecision Making	CodeCode Available	2	5
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Mar 11, 2025	AutoMLDecoder	CodeCode Available	2	5
VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Jun 12, 2025	MMEVideo MME	CodeCode Available	2	5

Show:10 25 50

← PrevPage 2 of 10Next →

No leaderboard results yet.