| Attention Mechanism based Cognition-level Scene Understanding | Apr 17, 2022 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models | May 20, 2025 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering | Sep 27, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| 2nd Place Solution to the GQA Challenge 2019 | Jul 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Attention Guided Semantic Relationship Parsing for Visual Question Answering | Oct 5, 2020 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 | 0 |
| VQA Training Sets are Self-play Environments for Generating Few-shot Pools | May 30, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering | Jan 25, 2023 | DecoderExplanation Generation | —Unverified | 0 | 0 |
| VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models | Feb 16, 2024 | Adversarial RobustnessLanguage Modelling | —Unverified | 0 | 0 |
| Towards Automated Error Analysis: Learning to Characterize Errors | Jan 13, 2022 | Common Sense ReasoningMeta-Learning | —Unverified | 0 | 0 |
| Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing | Dec 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Towards Complex Document Understanding By Discrete Reasoning | Jul 25, 2022 | document understandingQuestion Answering | —Unverified | 0 | 0 |
| Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation | Sep 10, 2021 | Knowledge DistillationQuestion Answering | —Unverified | 0 | 0 |
| Actively Seeking and Learning from Live Data | Apr 5, 2019 | Domain AdaptationMeta-Learning | —Unverified | 0 | 0 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| A survey on VQA_Datasets and Approaches | May 2, 2021 | Question AnsweringSurvey | —Unverified | 0 | 0 |
| A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation | Jun 12, 2023 | Image CaptioningMachine Translation | —Unverified | 0 | 0 |
| A Study on Multimodal and Interactive Explanations for Visual Question Answering | Mar 1, 2020 | Explainable Artificial Intelligence (XAI)Prediction | —Unverified | 0 | 0 |
| A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision | Mar 30, 2023 | DecoderMulti-Task Learning | —Unverified | 0 | 0 |
| Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Jan 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 | 0 |
| AstroLLaVA: towards the unification of astronomical data and natural language | Apr 11, 2025 | AstronomyImage Captioning | —Unverified | 0 | 0 |
| Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | Aug 24, 2024 | knowledge editingOpen-Domain Question Answering | —Unverified | 0 | 0 |
| VQA with Cascade of Self- and Co-Attention Blocks | Feb 28, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |