SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 5175 of 95 papers

TitleStatusHype
Temporal Reasoning Transfer from Text to Video0
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention CausalityCode2
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable QuestionsCode0
ParGo: Bridging Vision-Language with Partial and Global ViewsCode1
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?0
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object DetectionCode2
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine ExperimentCode0
Long Context Transfer from Language to VisionCode4
DrVideo: Document Retrieval Based Long Video Understanding0
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisCode1
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long VideosCode2
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models0
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models0
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language ModelsCode0
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingCode2
Pensieve: Retrospect-then-Compare Mitigates Visual HallucinationCode1
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language ModelsCode1
A Challenger to GPT-4V? Early Explorations of Gemini in Visual ExpertiseCode0
Silkie: Preference Distillation for Large Visual Language Models0
Honeybee: Locality-enhanced Projector for Multimodal LLMCode2
Prompt Highlighter: Interactive Control for Multi-Modal LLMsCode1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference OptimizationCode1
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsCode0
The Use of Symmetry for Models with Variable-size Variables0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.