| Visual question answering based evaluation metrics for text-to-image generation | Nov 15, 2024 | Image GenerationImage Manipulation | —Unverified | 0 | 0 |
| COIN: Counterfactual Image Generation for VQA Interpretation | Jan 10, 2022 | counterfactualImage Generation | —Unverified | 0 | 0 |
| CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering | Jan 1, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| COCO is "ALL'' You Need for Visual Instruction Fine-tuning | Jan 17, 2024 | AllImage Captioning | —Unverified | 0 | 0 |
| CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | Dec 18, 2023 | Continual LearningQuestion Answering | —Unverified | 0 | 0 |
| Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning | Sep 12, 2023 | Autonomous VehiclesQuestion Answering | —Unverified | 0 | 0 |
| Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels | Dec 9, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering based on Formal Logic | Nov 8, 2021 | Formal LogicQuestion Answering | —Unverified | 0 | 0 |
| RAVEN: A Dataset for Relational and Analogical Visual rEasoNing | Mar 7, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling | Aug 14, 2019 | Question AnsweringScene-Aware Dialogue | —Unverified | 0 | 0 |
| Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps | Aug 1, 2018 | Cross-Lingual TransferImage Captioning | —Unverified | 0 | 0 |
| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 | 0 |
| Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI | May 12, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering | Jan 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks | Jan 15, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| Reasoning Over History: Context Aware Visual Dialog | Nov 2, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 | 0 |
| Recent, rapid advancement in visual question answering architecture: a review | Mar 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Reciprocal Attention Fusion for Visual Question Answering | May 11, 2018 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| Zero-Shot Visual Question Answering | Nov 17, 2016 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| Recurrent and Contextual Models for Visual Question Answering | Mar 23, 2017 | DiversityMultiple-choice | —Unverified | 0 | 0 |
| Visual Question Answering for Cultural Heritage | Mar 22, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | May 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 | 0 |
| WoLF: Wide-scope Large Language Model Framework for CXR Understanding | Mar 19, 2024 | AnatomyInstruction Following | —Unverified | 0 | 0 |
| Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts | Dec 21, 2023 | HallucinationQuestion Answering | —Unverified | 0 | 0 |
| Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder | Jul 13, 2020 | Question AnsweringVisual Grounding | —Unverified | 0 | 0 |
| CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment | Mar 14, 2022 | parameter-efficient fine-tuningQuestion Answering | —Unverified | 0 | 0 |
| Visual question answering: from early developments to recent advances -- a survey | Jan 7, 2025 | DescriptiveNatural Language Understanding | —Unverified | 0 | 0 |
| Regularizing Attention Networks for Anomaly Detection in Visual Question Answering | Sep 21, 2020 | Anomaly DetectionQuestion Answering | —Unverified | 0 | 0 |
| Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective | Oct 22, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments | Mar 5, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding | Jul 7, 2025 | HallucinationQuestion Answering | —Unverified | 0 | 0 |
| Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck | Jun 25, 2023 | object-detectionObject Detection | —Unverified | 0 | 0 |
| Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment | Dec 12, 2023 | image-classificationImage Classification | —Unverified | 0 | 0 |
| CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering | Nov 19, 2022 | Continual LearningQuestion Answering | —Unverified | 0 | 0 |
| Claude 3.5 Sonnet Model Card Addendum | Jun 24, 2024 | Code GenerationMMR total | —Unverified | 0 | 0 |
| Rephrasing visual questions by specifying the entropy of the answer distribution | Apr 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Representing Movie Characters in Dialogues | Nov 1, 2019 | Question AnsweringRelation Classification | —Unverified | 0 | 0 |
| Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering" | May 21, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| RepsNet: Combining Vision with Language for Automated Medical Reports | Sep 27, 2022 | Contrastive LearningDecoder | —Unverified | 0 | 0 |
| RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents | Oct 17, 2024 | Question AnsweringTask Planning | —Unverified | 0 | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| CLAMP: Contrastive LAnguage Model Prompt-tuning | Dec 4, 2023 | Contrastive LearningImage Captioning | —Unverified | 0 | 0 |
| Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization | May 24, 2022 | Image CaptioningOut-of-Distribution Generalization | —Unverified | 0 | 0 |
| Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Jul 5, 2024 | Instance SegmentationOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| VrR-VG: Refocusing Visually-Relevant Relationships | Feb 1, 2019 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering | Aug 30, 2024 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| CIC: A Framework for Culturally-Aware Image Captioning | Feb 8, 2024 | DescriptiveImage Captioning | —Unverified | 0 | 0 |
| Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines | Feb 23, 2025 | Answer GenerationLanguage Modeling | —Unverified | 0 | 0 |