SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 16511700 of 2177 papers

TitleStatusHype
Visual Question Answering in the Medical Domain0
Visual Question Answering on 360° Images0
Visual Question Answering on Image Sets0
Visual Question Answering on Multiple Remote Sensing Image Modalities0
Visual Question Answering Using Semantic Information from Image Descriptions0
Visual Question Answering (VQA) on Images with Superimposed Text0
Visual Question Answering with Memory-Augmented Networks0
Visual Question Answering with Prior Class Semantics0
Visual Question Answering with Question Representation Update (QRU)0
Visual Question Generation as Dual Task of Visual Question Answering0
Visual Question: Predicting If a Crowd Will Agree on the Answer0
Visual Question Reasoning on General Dependency Tree0
Visual Reference Resolution using Attention Memory for Visual Dialog0
Visual Relationship Detection using Scene Graphs: A Survey0
Visual Superordinate Abstraction for Robust Concept Learning0
Visual TTR - Modelling Visual Question Answering in Type Theory with Records0
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation0
ViUniT: Visual Unit Tests for More Robust Visual Programming0
VL-BEiT: Generative Vision-Language Pretraining0
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment0
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks0
VLMAE: Vision-Language Masked Autoencoder0
VL-Mamba: Exploring State Space Models for Multimodal Learning0
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving0
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation0
EVJVQA Challenge: Multilingual Visual Question Answering0
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis0
VQABQ: Visual Question Answering by Basic Questions0
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving0
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions0
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization0
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering0
VQA-LOL: Visual Question Answering under the Lens of Logic0
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering0
VQA Training Sets are Self-play Environments for Generating Few-shot Pools0
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models0
VQA with Cascade of Self- and Co-Attention Blocks0
VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images0
WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models0
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks0
What If We Recaption Billions of Web Images with LLaMA-3?0
What is needed for simple spatial language capabilities in VQA?0
What Large Language Models Bring to Text-rich VQA?0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
Where is this coming from? Making groundedness count in the evaluation of Document VQA models0
Where To Look: Focus Regions for Visual Question Answering0
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering0
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities0
Why Does a Visual Question Have Different Answers?0
Show:102550
← PrevPage 34 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified