SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 15711580 of 2177 papers

TitleStatusHype
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
Visual Question Answering Using Semantic Information from Image Descriptions0
Characterizing Misclassifications of Deep NLP Models0
Robustness Analysis of Visual QA Models by Basic Questions0
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru0
Robust Visual Question Answering: Datasets, Methods, and Future Challenges0
Robust Visual Reasoning via Language Guided Neural Module Networks0
Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset0
Visual Question Answering (VQA) on Images with Superimposed Text0
Show:102550
← PrevPage 158 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified