SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 17011750 of 2177 papers

TitleStatusHype
EaSe: A Diagnostic Tool for VQA based on Answer DiversityCode0
Learning to Select Question-Relevant Relations for Visual Question Answering0
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question AnsweringCode0
StructuralLM: Structural Pre-training for Form Understanding0
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval0
Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention0
Cross-Modal Generative Augmentation for Visual Question Answering0
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention0
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine LossCode0
Iterated learning for emergent systematicity in VQA0
A survey on VQA_Datasets and Approaches0
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads0
Document Collection Visual Question Answering0
InfographicVQA0
Playing Lottery Tickets with Vision and Language0
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks0
Cross-Modal Retrieval Augmentation for Multi-Modal Classification0
Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention0
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata0
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over ImagesCode0
How Transferable are Reasoning Patterns in VQA?0
Multimodal Continuous Visual Attention Mechanisms0
Compressing Visual-linguistic Model via Knowledge Distillation0
`Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks0
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training0
Analysis on Image Set Visual Question Answering0
Domain-robust VQA with diverse datasets and methods but no target labels0
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) TasksCode0
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models0
Visual Grounding Strategies for Text-Only Natural Language Processing0
How to Design Sample and Computationally Efficient VQA Models0
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQACode0
A Comprehensive Survey of Scene Graphs: Generation and Application0
Characterizing Misclassifications of Deep NLP Models0
RL-CSDia: Representation Learning of Computer Science Diagrams0
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question AnsweringCode0
Contextual Dropout: An Efficient Sample-Dependent Dropout ModuleCode0
Visual Question Answering: which investigated applications?Code0
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues0
Learning Compositional Representation for Few-shot Visual Question Answering0
Answer Questions with Right Image Regions: A Visual Attention Regularization ApproachCode0
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games0
Unanswerable Questions about Images and Texts0
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation0
Understanding in Artificial Intelligence0
Latent Variable Models for Visual Question Answering0
Understanding the Role of Scene Graphs in Visual Question Answering0
Predicting Relative Depth between Objects from Semantic Features0
Self Supervision for Attention NetworksCode0
Show:102550
← PrevPage 35 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified