SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 11511160 of 2177 papers

TitleStatusHype
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL ModelsCode1
Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA0
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge0
Multi-Scale Attention for Audio Question AnsweringCode1
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa LanguageCode0
Modularized Zero-shot VQA with Pre-trained ModelsCode0
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
Zero-shot Visual Question Answering with Language Model FeedbackCode0
Mindstorms in Natural Language-Based Societies of Mind0
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical TasksCode2
Show:102550
← PrevPage 116 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified