SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 1120 of 95 papers

TitleStatusHype
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object DetectionCode2
Honeybee: Locality-enhanced Projector for Multimodal LLMCode2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsCode2
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention CausalityCode2
SpaceR: Reinforcing MLLMs in Video Spatial ReasoningCode2
MMICL: Empowering Vision-language Model with Multi-Modal In-Context LearningCode2
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsCode2
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsCode2
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement LearningCode2
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video ComprehensionCode2
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.