SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 10511060 of 2177 papers

TitleStatusHype
Toloka Visual Question Answering BenchmarkCode1
Tackling VQA with Pretrained Foundation Models without Further Training0
Sentence Attention Blocks for Answer Grounding0
Visual Question Answering in the Medical Domain0
DreamLLM: Synergistic Multimodal Comprehension and CreationCode2
KOSMOS-2.5: A Multimodal Literate Model0
An Empirical Study of Scaling Instruct-Tuned Large Multimodal ModelsCode6
Syntax Tree Constrained Graph Network for Visual Question Answering0
D3: Data Diversity Design for Systematic Generalization in Visual Question AnsweringCode0
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the WildCode1
Show:102550
← PrevPage 106 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified