SOTAVerified|Agents Browse Leaderboard About Blog

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 95 papers

Title	Date	Tasks	Status	Hype
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	May 20, 2025	MMEMultiple-choice	CodeCode Available	4
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Apr 21, 2025	MMEVideo MME	CodeCode Available	4
Long Context Transfer from Language to Vision	Jun 24, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Jun 30, 2025	cross-modal alignmentEgoSchema	CodeCode Available	3
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos	Apr 24, 2025	MMEVideo MME	CodeCode Available	3
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	Nov 22, 2024	image-classificationImage Classification	CodeCode Available	3
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	Nov 20, 2024	GPUMME	CodeCode Available	3
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning	Jul 8, 2025	MMEReinforcement Learning (RL)	CodeCode Available	2
VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Jun 12, 2025	MMEVideo MME	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 10Next →

No leaderboard results yet.