| Visual question answering based evaluation metrics for text-to-image generation | Nov 15, 2024 | Image GenerationImage Manipulation | —Unverified | 0 | 0 |
| COIN: Counterfactual Image Generation for VQA Interpretation | Jan 10, 2022 | counterfactualImage Generation | —Unverified | 0 | 0 |
| CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering | Jan 1, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| COCO is "ALL'' You Need for Visual Instruction Fine-tuning | Jan 17, 2024 | AllImage Captioning | —Unverified | 0 | 0 |
| CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | Dec 18, 2023 | Continual LearningQuestion Answering | —Unverified | 0 | 0 |
| Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning | Sep 12, 2023 | Autonomous VehiclesQuestion Answering | —Unverified | 0 | 0 |
| Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels | Dec 9, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering based on Formal Logic | Nov 8, 2021 | Formal LogicQuestion Answering | —Unverified | 0 | 0 |
| RAVEN: A Dataset for Relational and Analogical Visual rEasoNing | Mar 7, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling | Aug 14, 2019 | Question AnsweringScene-Aware Dialogue | —Unverified | 0 | 0 |
| Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps | Aug 1, 2018 | Cross-Lingual TransferImage Captioning | —Unverified | 0 | 0 |
| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 | 0 |
| Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI | May 12, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering | Jan 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks | Jan 15, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| Reasoning Over History: Context Aware Visual Dialog | Nov 2, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 | 0 |
| Recent, rapid advancement in visual question answering architecture: a review | Mar 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Reciprocal Attention Fusion for Visual Question Answering | May 11, 2018 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| Zero-Shot Visual Question Answering | Nov 17, 2016 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| Recurrent and Contextual Models for Visual Question Answering | Mar 23, 2017 | DiversityMultiple-choice | —Unverified | 0 | 0 |
| Visual Question Answering for Cultural Heritage | Mar 22, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | May 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 | 0 |
| WoLF: Wide-scope Large Language Model Framework for CXR Understanding | Mar 19, 2024 | AnatomyInstruction Following | —Unverified | 0 | 0 |
| Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts | Dec 21, 2023 | HallucinationQuestion Answering | —Unverified | 0 | 0 |