SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 7695 of 95 papers

TitleStatusHype
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue0
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning0
MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark0
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?0
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs0
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models0
Multi-Modal Evaluation Approach for Medical Image Segmentation0
Ultra-High-Frequency Harmony: mmWave Radar and Event Camera Orchestrate Accurate Drone Landing0
Visual Instruction Tuning with Chain of Region-of-Interest0
VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable QuestionsCode0
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine ExperimentCode0
MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated ObjectsCode0
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language ModelsCode0
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and CompositionCode0
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language ModelsCode0
Expand VSR Benchmark for VLLM to Expertize in Spatial RulesCode0
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsCode0
Re-Imagining Multimodal Instruction Tuning: A Representation ViewCode0
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature DistributionCode0
Show:102550
← PrevPage 4 of 4Next →

No leaderboard results yet.