SOTAVerified

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Showing 3140 of 95 papers

TitleStatusHype
Pensieve: Retrospect-then-Compare Mitigates Visual HallucinationCode1
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language BootstrappingCode1
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language ModelsCode1
Masked Motion Encoding for Self-Supervised Video Representation LearningCode1
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisCode1
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction TuningCode1
Expand VSR Benchmark for VLLM to Expertize in Spatial RulesCode0
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsCode0
Re-Imagining Multimodal Instruction Tuning: A Representation ViewCode0
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature DistributionCode0
Show:102550
← PrevPage 4 of 10Next →

No leaderboard results yet.