SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 13411350 of 2177 papers

TitleStatusHype
Continual VQA for Disaster Response SystemsCode0
Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar InstancesCode0
LAVIS: A Library for Language-Vision Intelligence0
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering0
MUST-VQA: MUltilingual Scene-text VQA0
PaLI: A Jointly-Scaled Multilingual Language-Image Model0
PreSTU: Pre-Training for Scene-Text Understanding0
MaXM: Towards Multilingual Visual Question AnsweringCode1
Pre-training image-language transformers for open-vocabulary tasks0
Improving the Cross-Lingual Generalisation in Visual Question AnsweringCode0
Show:102550
← PrevPage 135 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified