SOTAVerified|Agents Browse Leaderboard About

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 95 papers

Title	Date	Tasks	Status	Hype	Score
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2	5
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1	5
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	Nov 28, 2023	HallucinationMME	CodeCode Available	1	5
ParGo: Bridging Vision-Language with Partial and Global Views	Aug 23, 2024	MME	CodeCode Available	1	5
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1	5
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Oct 9, 2024	MME	CodeCode Available	1	5
Prompt Highlighter: Interactive Control for Multi-Modal LLMs	Dec 7, 2023	MMEText Generation	CodeCode Available	1	5
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Mar 21, 2024	HallucinationMME	CodeCode Available	1	5
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models	Mar 20, 2024	MMEVisual Question Answering	CodeCode Available	1	5
Semi-supervised Domain Adaptation via Minimax Entropy	Apr 13, 2019	Domain AdaptationMME	CodeCode Available	1	5

Show:10 25 50

← PrevPage 3 of 10Next →

No leaderboard results yet.