SOTAVerified|Agents Browse Leaderboard About

MM-Vet

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–19 of 19 papers

Title	Date	Tasks	Status	Hype
CogVLM2: Visual Language Models for Image and Video Understanding	Aug 29, 2024	MM-VetMVBench	CodeCode Available	9
CogAgent: A Visual Language Model for GUI Agents	Dec 14, 2023	Language Modeling	CodeCode Available	5
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities	Aug 1, 2024	MathMM-Vet	CodeCode Available	3
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3
Attention Prompting on Image for Large Vision-Language Models	Sep 25, 2024	MM-VetVisual Prompting	CodeCode Available	2
Self-Supervised Visual Preference Alignment	Apr 16, 2024	8kMM-Vet	CodeCode Available	2
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning	Nov 13, 2023	Instruction FollowingMM-Vet	CodeCode Available	2
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities	Aug 4, 2023	MathMM-Vet	CodeCode Available	2
Mitigating Object Hallucinations via Sentence-Level Early Intervention	Jul 16, 2025	HallucinationMM-Vet	CodeCode Available	1
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models	Feb 16, 2024	DiversityInstruction Following	CodeCode Available	1
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?	Nov 29, 2023	In-Context LearningInstruction Following	CodeCode Available	1
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision	Nov 13, 2023	HallucinationMM-Vet	CodeCode Available	1
MR. Judge: Multimodal Reasoner as a Judge	May 19, 2025	MM-VetMultiple-choice	—Unverified	0
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models	Mar 19, 2025	MM-VetMultimodal Reasoning	—Unverified	0
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models	Jan 1, 2025	MM-VetMultimodal Reasoning	—Unverified	0
OmniFusion Technical Report	Apr 9, 2024	MM-VetTextVQA	CodeCode Available	0
DIEM: Decomposition-Integration Enhancing Multimodal Insights	Jan 1, 2024	MM-VetQuestion Answering	—Unverified	0
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model	Oct 31, 2023	Autonomous DrivingLanguage Modeling	—Unverified	0

Show:10 25 50

No leaderboard results yet.