SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 15011550 of 2177 papers

TitleStatusHype
Visual question answering based evaluation metrics for text-to-image generation0
COIN: Counterfactual Image Generation for VQA Interpretation0
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering0
COCO is "ALL'' You Need for Visual Instruction Fine-tuning0
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update0
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning0
Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels0
Visual Question Answering based on Formal Logic0
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing0
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation0
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling0
Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI0
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering0
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks0
Reasoning Over History: Context Aware Visual Dialog0
Recent, rapid advancement in visual question answering architecture: a review0
Reciprocal Attention Fusion for Visual Question Answering0
Zero-Shot Visual Question Answering0
Recurrent and Contextual Models for Visual Question Answering0
Visual Question Answering for Cultural Heritage0
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering0
WoLF: Wide-scope Large Language Model Framework for CXR Understanding0
Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts0
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder0
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment0
Visual question answering: from early developments to recent advances -- a survey0
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering0
Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective0
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck0
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment0
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering0
Claude 3.5 Sonnet Model Card Addendum0
Rephrasing visual questions by specifying the entropy of the answer distribution0
Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks0
Representing Movie Characters in Dialogues0
Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"0
RepsNet: Combining Vision with Language for Automated Medical Reports0
RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents0
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
CLAMP: Contrastive LAnguage Model Prompt-tuning0
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization0
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge0
VrR-VG: Refocusing Visually-Relevant Relationships0
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering0
CIC: A Framework for Culturally-Aware Image Captioning0
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines0
Show:102550
← PrevPage 31 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified