| A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering | Oct 1, 2022 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory | Feb 6, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Object Attribute Matters in Visual Question Answering | Dec 20, 2023 | AttributeGraph Neural Network | CodeCode Available | 0 | 5 |
| Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances | Sep 18, 2022 | AttributeQuestion Answering | CodeCode Available | 0 | 5 |
| CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays | May 23, 2025 | DiagnosticQuestion Answering | CodeCode Available | 0 | 5 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| MUTAN: Multimodal Tucker Fusion for Visual Question Answering | May 18, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation | Jun 7, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 | 5 |
| AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care | May 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MUREL: Multimodal Relational Reasoning for Visual Question Answering | Feb 25, 2019 | Relational ReasoningVisual Question Answering | CodeCode Available | 0 | 5 |
| Multi-Sourced Compositional Generalization in Visual Question Answering | May 29, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Barlow constrained optimization for Visual Question Answering | Mar 7, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss | May 5, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Multimodal Residual Learning for Visual QA | Jun 5, 2016 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 | 5 |
| Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond | Oct 8, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Cross-Modal Contrastive Learning for Robust Reasoning in VQA | Nov 21, 2022 | Contrastive LearningQuestion Answering | CodeCode Available | 0 | 5 |
| BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data | Oct 1, 2024 | Code GenerationLogical Reasoning | CodeCode Available | 0 | 5 |
| Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Dec 23, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning | Jan 1, 2025 | Audio-visual Question AnsweringContinual Learning | CodeCode Available | 0 | 5 |
| Multimodal Explanations: Justifying Decisions and Pointing to the Evidence | Feb 15, 2018 | Activity RecognitionExplainable Models | CodeCode Available | 0 | 5 |
| Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering | Aug 4, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |