SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 951960 of 2177 papers

TitleStatusHype
MaMMUT: A Simple Architecture for Joint Learning for MultiModal TasksCode0
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question AnsweringCode0
Kvasir-VQA: A Text-Image Pair GI Tract DatasetCode0
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal EndoscopyCode0
LXMERT Model Compression for Visual Question AnsweringCode0
Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text RepresentationsCode0
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question AnsweringCode0
Are VLMs Really BlindCode0
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance ViewCode0
Logical Implications for Visual Question Answering ConsistencyCode0
Show:102550
← PrevPage 96 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified