SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 5175 of 95 papers

TitleStatusHype
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding0
Re-Imagining Multimodal Instruction Tuning: A Representation ViewCode0
Ultra-High-Frequency Harmony: mmWave Radar and Event Camera Orchestrate Accurate Drone Landing0
AIDE: Agentically Improve Visual Language Model with Domain Experts0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment0
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding0
MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark0
Temporal Preference Optimization for Long-Form Video Understanding0
Expand VSR Benchmark for VLLM to Expertize in Spatial RulesCode0
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors0
Apollo: An Exploration of Video Understanding in Large Multimodal Models0
EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation0
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads0
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy0
The economic value of empowering older patients transitioning from hospital to home: Evidence from the 'Your Care Needs You' intervention0
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning0
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification0
Temporal Reasoning Transfer from Text to Video0
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable QuestionsCode0
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?0
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine ExperimentCode0
DrVideo: Document Retrieval Based Long Video Understanding0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.