SOTAVerified|Agents Browse Leaderboard About Blog

Visual Question Answering

MLLM Leaderboard

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1301–1310 of 2177 papers

Title	Date	Tasks	Status	Hype	Score
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples	Oct 18, 2024	AttributeQuestion Answering	—Unverified	0	0
Detection-based Intermediate Supervision for Visual Question Answering	Dec 26, 2023	cross-modal alignmentLogical Reasoning	—Unverified	0	0
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey	Nov 26, 2024	Natural Language UnderstandingQuestion Answering	—Unverified	0	0
Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving	May 9, 2025	Autonomous DrivingBackdoor Attack	—Unverified	0	0
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models	Jun 14, 2024	HallucinationMedical Visual Question Answering	—Unverified	0	0
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models	Oct 9, 2023	HallucinationObject	—Unverified	0	0
Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability	Apr 20, 2025	Question AnsweringVisual Question Answering	—Unverified	0	0
NegVQA: Can Vision Language Models Understand Negation?	May 28, 2025	NegationQuestion Answering	—Unverified	0	0
Aligning MAGMA by Few-Shot Learning and Finetuning	Oct 18, 2022	Few-Shot LearningImage Captioning	—Unverified	0	0
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection	Mar 31, 2016	Caption GenerationClassification	—Unverified	0	0

Show:10 25 50

← PrevPage 131 of 218Next →

All datasets MM-Vet ViP-Bench VQA v2 test-dev BenchLMM MMBench V*bench VQA v2 val MSRVTT-QA VQA v2 test-std MMHal-Bench MSVD-QA PlotQA-D1

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MMCTAgent (GPT-4 + GPT-4V)	GPT-4 score	74.24	—	Unverified
2	Qwen2-VL-72B	GPT-4 score	74	—	Unverified
3	InternVL2.5-78B	GPT-4 score	72.3	—	Unverified
4	GPT-4o +text rationale +IoT	GPT-4 score	72.2	—	Unverified
5	Lyra-Pro	GPT-4 score	71.4	—	Unverified
6	GLM-4V-Plus	GPT-4 score	71.1	—	Unverified
7	Phantom-7B	GPT-4 score	70.8	—	Unverified
8	InternVL2.5-38B	GPT-4 score	68.8	—	Unverified
9	InternVL2-26B (SGP, token ratio 64%)	GPT-4 score	65.6	—	Unverified
10	Baichuan-Omni (7B)	GPT-4 score	65.4	—	Unverified