SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 626650 of 2177 papers

TitleStatusHype
OmniFusion Technical ReportCode0
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual DialogueCode0
Dual Recurrent Attention Units for Visual Question AnsweringCode0
Bridging Vision and Language Spaces with Assignment PredictionCode0
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question AnsweringCode0
OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence RoboticsCode0
On Modality Bias Recognition and ReductionCode0
Dual Attention Networks for Visual Reference Resolution in Visual DialogCode0
Dual Attention Networks for Multimodal Reasoning and MatchingCode0
Object Attribute Matters in Visual Question AnsweringCode0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Object-aware Adaptive-Positivity Learning for Audio-Visual Question AnsweringCode0
Towards Flexible Evaluation for Generative Visual Question AnsweringCode0
Answer Them All! Toward Universal Visual Question Answering ModelsCode0
Neural Module NetworksCode0
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language UnderstandingCode0
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question AnsweringCode0
Answer Questions with Right Image Regions: A Visual Attention Regularization ApproachCode0
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question AnsweringCode0
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional GeneralizationCode0
No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only MemoryCode0
MUTAN: Multimodal Tucker Fusion for Visual Question AnsweringCode0
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical StudyCode0
Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMsCode0
Multi-Sourced Compositional Generalization in Visual Question AnsweringCode0
Show:102550
← PrevPage 26 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified