SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 961970 of 2177 papers

TitleStatusHype
Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text RepresentationsCode0
Are VLMs Really BlindCode0
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question AnsweringCode0
LXMERT Model Compression for Visual Question AnsweringCode0
CAST: Cross-modal Alignment Similarity Test for Vision Language ModelsCode0
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target TokensCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature DistillationCode0
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference DataCode0
Cascaded Mutual Modulation for Visual ReasoningCode0
Show:102550
← PrevPage 97 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified