SOTAVerified|Agents Browse Leaderboard About

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 95 papers

Title	Date	Tasks	Status	Hype
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection	Aug 7, 2024	3D Object DetectionAutonomous Navigation	CodeCode Available	2
Honeybee: Locality-enhanced Projector for Multimodal LLM	Dec 11, 2023	MMEScience Question Answering	CodeCode Available	2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Oct 7, 2024	Causal Inferencecounterfactual	CodeCode Available	2
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning	Apr 2, 2025	MMESpatial Reasoning	CodeCode Available	2
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning	Sep 14, 2023	HallucinationIn-Context Learning	CodeCode Available	2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions	Aug 8, 2023	Caption GenerationImage Captioning	CodeCode Available	2
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning	Jul 8, 2025	MMEReinforcement Learning (RL)	CodeCode Available	2
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Mar 11, 2025	AutoMLDecoder	CodeCode Available	2

Show:10 25 50

← PrevPage 2 of 10Next →

No leaderboard results yet.