SOTAVerified

MM-Vet

Papers

Showing 119 of 19 papers

TitleStatusHype
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
CogAgent: A Visual Language Model for GUI AgentsCode5
Lyra: An Efficient and Speech-Centric Framework for Omni-CognitionCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
ShapeLLM: Universal 3D Object Understanding for Embodied InteractionCode3
Attention Prompting on Image for Large Vision-Language ModelsCode2
Self-Supervised Visual Preference AlignmentCode2
To See is to Believe: Prompting GPT-4V for Better Visual Instruction TuningCode2
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesCode2
Mitigating Object Hallucinations via Sentence-Level Early InterventionCode1
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language ModelsCode1
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?Code1
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided RevisionCode1
MR. Judge: Multimodal Reasoner as a Judge0
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models0
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models0
OmniFusion Technical ReportCode0
DIEM: Decomposition-Integration Enhancing Multimodal Insights0
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model0
Show:102550

No leaderboard results yet.