SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 10011050 of 2177 papers

TitleStatusHype
American == White in Multimodal Language-and-Image AI0
Learning Sparse Mixture of Experts for Visual Question Answering0
Abduction of Domain Relationships from Data for VQA0
Learning Sparsity for Effective and Efficient Music Performance Question Answering0
Compound Tokens: Channel Fusion for Vision-Language Representation Learning0
Generating Triples with Adversarial Networks for Scene Graph Construction0
Compositional Memory for Visual Question Answering0
Attention Mechanism based Cognition-level Scene Understanding0
Generating Rationales in Visual Question Answering0
DualNet: Domain-Invariant Network for Visual Question Answering0
Generating Natural Questions from Images for Multimodal Assistants0
Attention Guided Semantic Relationship Parsing for Visual Question Answering0
Adversarial Representation Learning for Text-to-Image Matching0
Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems0
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention0
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge0
Neural Reasoning, Fast and Slow, for Video Question Answering0
Learning to Recognize the Unseen Visual Predicates0
Learning to Select Question-Relevant Relations for Visual Question Answering0
Learning to Specialize with Knowledge Distillation for Visual Question Answering0
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network0
Learning Visual Knowledge Memory Networks for Visual Question Answering0
Compositional Attention Networks for Interpretability in Natural Language Question Answering0
Component Analysis for Visual Question Answering Architectures0
Generalized Hadamard-Product Fusion Operators for Visual Question Answering0
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems0
MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering0
Compact Tensor Pooling for Visual Question Answering0
Gender and Racial Bias in Visual Question Answering Datasets0
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models0
Gemini Pro Defeated by GPT-4V: Evidence from Education0
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models0
GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning0
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis0
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering0
Measuring Machine Intelligence Through Visual Question Answering0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering0
Gamified crowd-sourcing of high-quality data for visual fine-tuning0
All You May Need for VQA are Image Captions0
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation0
FVQA: Fact-based Visual Question Answering0
FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering0
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering0
Fusion of Detected Objects in Text for Visual Question Answering0
COIN: Counterfactual Image Generation for VQA Interpretation0
A survey on VQA_Datasets and Approaches0
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model0
FunBench: Benchmarking Fundus Reading Skills of MLLMs0
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering0
Show:102550
← PrevPage 21 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified