SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 9195 of 95 papers

TitleStatusHype
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language ModelsCode0
Expand VSR Benchmark for VLLM to Expertize in Spatial RulesCode0
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsCode0
Re-Imagining Multimodal Instruction Tuning: A Representation ViewCode0
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature DistributionCode0
Show:102550
← PrevPage 10 of 10Next →

No leaderboard results yet.