| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 | 5 |
| DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | Nov 17, 2019 | feature selectionQuestion Answering | CodeCode Available | 0 | 5 |
| Dual Recurrent Attention Units for Visual Question Answering | Feb 1, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Bridging Vision and Language Spaces with Assignment Prediction | Apr 15, 2024 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 0 | 5 |
| Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering | Mar 14, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics | Feb 21, 2022 | BIG-bench Machine LearningGraph Generation | CodeCode Available | 0 | 5 |
| On Modality Bias Recognition and Reduction | Feb 25, 2022 | Action RecognitionMulti-modal Classification | CodeCode Available | 0 | 5 |
| Dual Attention Networks for Visual Reference Resolution in Visual Dialog | Feb 25, 2019 | AI AgentQuestion Answering | CodeCode Available | 0 | 5 |
| Dual Attention Networks for Multimodal Reasoning and Matching | Nov 2, 2016 | Collaborative InferenceImage-text matching | CodeCode Available | 0 | 5 |
| Object Attribute Matters in Visual Question Answering | Dec 20, 2023 | AttributeGraph Neural Network | CodeCode Available | 0 | 5 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering | Dec 20, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 | 5 |
| Towards Flexible Evaluation for Generative Visual Question Answering | Aug 1, 2024 | DecoderGenerative Visual Question Answering | CodeCode Available | 0 | 5 |
| Answer Them All! Toward Universal Visual Question Answering Models | Mar 1, 2019 | AllQuestion Answering | CodeCode Available | 0 | 5 |
| Neural Module Networks | Nov 9, 2015 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding | Oct 4, 2018 | Question AnsweringRepresentation Learning | CodeCode Available | 0 | 5 |
| Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering | Dec 1, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 | 5 |
| No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory | Feb 6, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 | 5 |
| MUTAN: Multimodal Tucker Fusion for Visual Question Answering | May 18, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study | Mar 21, 2025 | AttributeMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Multi-Sourced Compositional Generalization in Visual Question Answering | May 29, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |