SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 611620 of 2177 papers

TitleStatusHype
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real ImagesCode1
Hallucination Augmented Contrastive Learning for Multimodal Large Language ModelCode1
Dynamic Language Binding in Relational Visual ReasoningCode1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Faithful Multimodal Explanation for Visual Question AnsweringCode1
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal ReasoningCode1
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language ModelsCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMsCode1
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask ArchitectureCode1
Show:102550
← PrevPage 62 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified