| MUREL: Multimodal Relational Reasoning for Visual Question Answering | Feb 25, 2019 | Relational ReasoningVisual Question Answering | CodeCode Available | 0 | 5 |
| OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese | May 7, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 0 | 5 |
| Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions | Nov 20, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | Jan 3, 2019 | DiagnosticImage Segmentation | CodeCode Available | 0 | 5 |
| Modulating early visual processing by language | Jul 2, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding | Sep 1, 2023 | Graph GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Few-Shot Multimodal Explanation for Visual Question Answering | Oct 28, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 | 5 |
| CLEVR\_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images | Jun 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Active Learning for Visual Question Answering: An Empirical Study | Nov 6, 2017 | Active LearningVisual Question Answering | CodeCode Available | 0 | 5 |
| IQA: Visual Question Answering in Interactive Environments | Dec 9, 2017 | NavigateReinforcement Learning | CodeCode Available | 0 | 5 |
| FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection | Aug 17, 2024 | Federated LearningMedical Visual Question Answering | CodeCode Available | 0 | 5 |
| Federated Document Visual Question Answering: A Pilot Study | May 10, 2024 | Federated LearningQuestion Answering | CodeCode Available | 0 | 5 |
| CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images | Apr 13, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Ask Your Neurons: A Deep Learning Approach to Visual Question Answering | May 9, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering | May 26, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Is Multimodal Vision Supervision Beneficial to Language? | Feb 10, 2023 | Image RetrievalNatural Language Understanding | CodeCode Available | 0 | 5 |
| Modularized Zero-shot VQA with Pre-trained Models | May 27, 2023 | object-detectionObject Detection | CodeCode Available | 0 | 5 |
| Towards Language-guided Visual Recognition via Dynamic Convolutions | Oct 17, 2021 | Question AnsweringReferring Expression | CodeCode Available | 0 | 5 |
| MQA: Answering the Question via Robotic Manipulation | Mar 10, 2020 | Imitation LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Factor Graph Attention | Apr 11, 2019 | Graph AttentionQuestion Answering | CodeCode Available | 0 | 5 |
| CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning | Nov 26, 2018 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering | Nov 1, 2021 | multimodal interactionMultiple-choice | CodeCode Available | 0 | 5 |
| Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language | Jan 1, 2023 | Question AnsweringSelf-Supervised Learning | CodeCode Available | 0 | 5 |
| Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering | Nov 17, 2015 | Image CaptioningQuestion Answering | CodeCode Available | 0 | 5 |
| Mixture-of-Subspaces in Low-Rank Adaptation | Jun 16, 2024 | Common Sense ReasoningImage Generation | CodeCode Available | 0 | 5 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 | 5 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| A simple neural network module for relational reasoning | Jun 5, 2017 | Image Retrieval with Multi-Modal QueryQuestion Answering | CodeCode Available | 0 | 5 |
| MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | Jan 11, 2020 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 | 5 |
| A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models | Aug 2, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| A Simple Baseline for Knowledge-Based Visual Question Answering | Oct 20, 2023 | In-Context LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Treble Counterfactual VLMs: A Causal Approach to Hallucination | Mar 8, 2025 | Autonomous Drivingcounterfactual | CodeCode Available | 0 | 5 |
| Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Aug 16, 2024 | Decision MakingMedical Visual Question Answering | CodeCode Available | 0 | 5 |
| MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models | Feb 28, 2025 | Decision MakingHallucination | CodeCode Available | 0 | 5 |
| Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images | Feb 8, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 0 | 5 |
| Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning | Jul 6, 2022 | DiagnosticMulti-Task Learning | CodeCode Available | 0 | 5 |
| Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering | Dec 11, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 | 5 |
| Measuring Faithful and Plausible Visual Grounding in VQA | May 24, 2023 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models | Jul 22, 2024 | DisentanglementQuestion Answering | CodeCode Available | 0 | 5 |
| BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA | Mar 4, 2025 | Medical DiagnosisQuestion Answering | CodeCode Available | 0 | 5 |
| Uncovering the Full Potential of Visual Grounding Methods in VQA | Jan 15, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts | Jun 25, 2024 | FairnessQuestion Answering | CodeCode Available | 0 | 5 |
| ArtQuest: Countering Hidden Language Biases in ArtVQA | Jan 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Evaluating Attribute Comprehension in Large Vision-Language Models | Aug 25, 2024 | AttributeImage-text matching | CodeCode Available | 0 | 5 |
| ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments | Oct 8, 2024 | DecoderQuestion Answering | CodeCode Available | 0 | 5 |
| MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | Mar 29, 2023 | Cross-Modal RetrievalDecoder | CodeCode Available | 0 | 5 |
| MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | Dec 31, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| LXMERT Model Compression for Visual Question Answering | Oct 23, 2023 | modelModel Compression | CodeCode Available | 0 | 5 |