| Query and Attention Augmentation for Knowledge-Based Explainable Reasoning | Jan 1, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Dynamic Memory Networks for Visual and Textual Question Answering | Mar 4, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 |
| An Improved Attention for Visual Question Answering | Nov 4, 2020 | DecoderQuestion Answering | CodeCode Available | 0 |
| TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models | Aug 7, 2023 | backdoor defenseobject-detection | CodeCode Available | 0 |
| TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models | May 21, 2025 | Human AgingQuestion Answering | CodeCode Available | 0 |
| Adaptively Clustering Neighbor Elements for Image-Text Generation | Jan 5, 2023 | ClusteringDecoder | CodeCode Available | 0 |
| Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering | Mar 6, 2022 | Graph AttentionQuestion Answering | CodeCode Available | 0 |
| DVQA: Understanding Data Visualizations via Question Answering | Jan 24, 2018 | ArticlesChart Question Answering | CodeCode Available | 0 |
| CLEVR\_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images | Jun 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge | Aug 9, 2017 | GPUVisual Question Answering | CodeCode Available | 0 |
| CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images | Apr 13, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning | Nov 26, 2018 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 |
| QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View | Jul 18, 2024 | Action AnticipationAction Recognition | CodeCode Available | 0 |
| Latent Alignment and Variational Attention | Jul 10, 2018 | Hard AttentionMachine Translation | CodeCode Available | 0 |
| Large Models in Dialogue for Active Perception and Anomaly Detection | Jan 27, 2025 | Anomaly DetectionQuestion Answering | CodeCode Available | 0 |
| Large Language Models Understand Layout | Jul 8, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy | Jun 11, 2025 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | Nov 17, 2019 | feature selectionQuestion Answering | CodeCode Available | 0 |
| Dual Recurrent Attention Units for Visual Question Answering | Feb 1, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA | Jun 18, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Dual Attention Networks for Visual Reference Resolution in Visual Dialog | Feb 25, 2019 | AI AgentQuestion Answering | CodeCode Available | 0 |
| RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding | May 20, 2025 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| CAST: Cross-modal Alignment Similarity Test for Vision Language Models | Sep 17, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic | Aug 3, 2023 | Chart Question AnsweringFormal Logic | CodeCode Available | 0 |
| Kvasir-VQA: A Text-Image Pair GI Tract Dataset | Sep 2, 2024 | Image CaptioningImage Generation | CodeCode Available | 0 |
| A Neuro-Symbolic ASP Pipeline for Visual Question Answering | May 16, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language | Mar 31, 2025 | FormQuestion Answering | CodeCode Available | 0 |
| Knowledge Generation for Zero-shot Knowledge-based VQA | Feb 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge | Jan 1, 2023 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding | Sep 1, 2023 | Graph GenerationImage Captioning | CodeCode Available | 0 |
| Dual Attention Networks for Multimodal Reasoning and Matching | Nov 2, 2016 | Collaborative InferenceImage-text matching | CodeCode Available | 0 |
| Recommending Themes for Ad Creative Design via Visual-Linguistic Representations | Jan 20, 2020 | Question AnsweringRecommendation Systems | CodeCode Available | 0 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Recursive Visual Attention in Visual Dialog | Dec 6, 2018 | Question AnsweringVisual Dialog | CodeCode Available | 0 |
| Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models | Jul 22, 2024 | DisentanglementQuestion Answering | CodeCode Available | 0 |
| ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mapping | Jan 6, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving | Jul 18, 2023 | Autonomous DrivingModel Selection | CodeCode Available | 0 |
| Cascaded Mutual Modulation for Visual Reasoning | Sep 6, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning | Jul 6, 2022 | DiagnosticMulti-Task Learning | CodeCode Available | 0 |
| Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning | May 23, 2024 | Logical Reasoning Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| Towards a Unified Multimodal Reasoning Framework | Dec 22, 2023 | Multimodal ReasoningMultiple-choice | CodeCode Available | 0 |
| Relation-Aware Graph Attention Network for Visual Question Answering | Mar 29, 2019 | Graph AttentionImplicit Relations | CodeCode Available | 0 |
| 'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks | Mar 28, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Adaptive loose optimization for robust question answering | May 6, 2023 | Extractive Question-AnsweringMachine Reading Comprehension | CodeCode Available | 0 |
| REMIND Your Neural Network to Prevent Catastrophic Forgetting | Oct 6, 2019 | QuantizationQuestion Answering | CodeCode Available | 0 |
| Bridging Vision and Language Spaces with Assignment Prediction | Apr 15, 2024 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 0 |
| Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models | Apr 6, 2024 | MMEObject | CodeCode Available | 0 |
| Joint Answering and Explanation for Visual Commonsense Reasoning | Feb 25, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |