SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 12511300 of 2177 papers

TitleStatusHype
Multimodal Commonsense Knowledge Distillation for Visual Question Answering0
VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework0
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation0
Multimodal Continuous Visual Attention Mechanisms0
Multi-modal Deep Analysis for Multimedia0
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration0
Vision-Language Models as Success Detectors0
Vision Language Models Can Parse Floor Plan Maps0
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!0
Multimodal Few-Shot Learning with Frozen Language Models0
Document Visual Question Answering Challenge 20200
Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing0
Multimodal Graph Networks for Compositional Generalization in Visual Question Answering0
Multimodal grid features and cell pointers for Scene Text Visual Question Answering0
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis0
Multimodal Integration of Human-Like Attention in Visual Question Answering0
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications0
Document Collection Visual Question Answering0
Multi-modality Latent Interaction Network for Visual Question Answering0
Document AI: Benchmarks, Models and Applications0
Vision-Language Models for Edge Networks: A Comprehensive Survey0
Multimodal Learning and Reasoning for Visual Question Answering0
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering0
Multimodal Neural Graph Memory Networks for Visual Question Answering0
DLIP: Distilling Language-Image Pre-training0
A Multimodal Memes Classification: A Survey and Open Research Issues0
Diversity and Consistency: Exploring Visual Question-Answer Pair Generation0
Diversifying Joint Vision-Language Tokenization Learning0
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning0
Multimodal Reranking for Knowledge-Intensive Visual Question Answering0
American == White in Multimodal Language-and-Image AI0
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications0
Multimodal Transformer With a Low-Computational-Cost Guarantee0
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
Multimodal Unified Attention Networks for Vision-and-Language Interactions0
All You May Need for VQA are Image Captions0
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark0
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning0
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models0
Vision-Language Pretraining: Current Trends and the Future0
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels0
Multi-task Learning of Hierarchical Vision-Language Representation0
AlignVE: Visual Entailment Recognition Based on Alignment Relations0
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck0
MUST-VQA: MUltilingual Scene-text VQA0
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering0
Differentiable End-to-End Program Executor for Sample and Computationally Efficient VQA0
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering0
MyVLM: Personalizing VLMs for User-Specific Queries0
Vision-to-Language Tasks Based on Attributes and Attention Mechanism0
Show:102550
← PrevPage 26 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified