SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 13511400 of 2177 papers

TitleStatusHype
Question-Driven Graph Fusion Network For Visual Question Answering0
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding0
Question-Guided Hybrid Convolution for Visual Question Answering0
Question Guided Modular Routing Networks for Visual Question Answering0
Question-Led Semantic Structure Enhanced Attentions for VQA0
Question Modifiers in Visual Question Answering0
Question Relevance in Visual Question Answering0
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions0
Question Type Guided Attention in Visual Question Answering0
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning0
Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels0
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing0
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling0
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI0
Reasoning Over History: Context Aware Visual Dialog0
Recent, rapid advancement in visual question answering architecture: a review0
Reciprocal Attention Fusion for Visual Question Answering0
Recurrent and Contextual Models for Visual Question Answering0
Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts0
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder0
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment0
Rephrasing visual questions by specifying the entropy of the answer distribution0
Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks0
Representing Movie Characters in Dialogues0
Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"0
RepsNet: Combining Vision with Language for Automated Medical Reports0
RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents0
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization0
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge0
VrR-VG: Refocusing Visually-Relevant Relationships0
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering0
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines0
PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models0
Retrieving Visual Facts For Few-Shot Visual Question Answering0
Reusable Slotwise Mechanisms0
Revisiting Multi-Modal LLM Evaluation0
ReWind: Understanding Long Videos with Instructed Learnable Memory0
ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding0
RL-CSDia: Representation Learning of Computer Science Diagrams0
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest0
RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases0
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets0
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis0
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
Robustness Analysis of Visual QA Models by Basic Questions0
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru0
Robust Visual Question Answering: Datasets, Methods, and Future Challenges0
Show:102550
← PrevPage 28 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified