SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 13261350 of 2177 papers

TitleStatusHype
Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing0
Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQACode1
Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive LearningCode1
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning0
Retrieval Augmented Visual Question Answering with Outside KnowledgeCode2
On the Effects of Video Grounding on Language Models0
Dual Capsule Attention Mask Network with Mutual Learning for Visual Question Answering0
A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question AnsweringCode0
Task Formulation Matters When Learning Continually: A Case Study in Visual Question AnsweringCode0
Linearly Mapping from Image to Text SpaceCode1
TVLT: Textless Vision-Language TransformerCode1
RepsNet: Combining Vision with Language for Automated Medical Reports0
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong BaselineCode1
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering0
Continual VQA for Disaster Response SystemsCode0
Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar InstancesCode0
LAVIS: A Library for Language-Vision Intelligence0
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering0
MUST-VQA: MUltilingual Scene-text VQA0
PaLI: A Jointly-Scaled Multilingual Language-Image Model0
PreSTU: Pre-Training for Scene-Text Understanding0
MaXM: Towards Multilingual Visual Question AnsweringCode1
Pre-training image-language transformers for open-vocabulary tasks0
Improving the Cross-Lingual Generalisation in Visual Question AnsweringCode0
Show:102550
← PrevPage 54 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified