SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 7180 of 95 papers

TitleStatusHype
Honeybee: Locality-enhanced Projector for Multimodal LLMCode2
Prompt Highlighter: Interactive Control for Multi-Modal LLMsCode1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference OptimizationCode1
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsCode0
The Use of Symmetry for Models with Variable-size Variables0
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction TuningCode1
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model0
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors0
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and CompositionCode0
MMICL: Empowering Vision-language Model with Multi-Modal In-Context LearningCode2
Show:102550
← PrevPage 8 of 10Next →

No leaderboard results yet.