SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 651660 of 2177 papers

TitleStatusHype
Domain-robust VQA with diverse datasets and methods but no target labels0
How to Design Sample and Computationally Efficient VQA Models0
Domain Adaptation of VLM for Soccer Video Understanding0
Do Explanations make VQA Models more Predictable to a Human?0
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects0
How to find a good image-text embedding for remote sensing visual question answering?0
How Transferable are Reasoning Patterns in VQA?0
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?0
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!0
Boosting Cross-task Transferability of Adversarial Patches with Visual Relations0
Show:102550
← PrevPage 66 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified