SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 19762000 of 2177 papers

TitleStatusHype
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order LogicCode0
Kvasir-VQA: A Text-Image Pair GI Tract DatasetCode0
A Neuro-Symbolic ASP Pipeline for Visual Question AnsweringCode0
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean LanguageCode0
Knowledge Generation for Zero-shot Knowledge-based VQACode0
Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical KnowledgeCode0
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic GroundingCode0
Dual Attention Networks for Multimodal Reasoning and MatchingCode0
Recommending Themes for Ad Creative Design via Visual-Linguistic RepresentationsCode0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Recursive Visual Attention in Visual DialogCode0
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language ModelsCode0
ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mappingCode0
Towards a performance analysis on pre-trained Visual Question Answering models for autonomous drivingCode0
Cascaded Mutual Modulation for Visual ReasoningCode0
Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task LearningCode0
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative ReasoningCode0
Towards a Unified Multimodal Reasoning FrameworkCode0
Relation-Aware Graph Attention Network for Visual Question AnsweringCode0
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) TasksCode0
Adaptive loose optimization for robust question answeringCode0
REMIND Your Neural Network to Prevent Catastrophic ForgettingCode0
Bridging Vision and Language Spaces with Assignment PredictionCode0
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language ModelsCode0
Joint Answering and Explanation for Visual Commonsense ReasoningCode0
Show:102550
← PrevPage 80 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified