| Core Tokensets for Data-efficient Sequential Training of Transformers | Oct 8, 2024 | Image Captioningimage-classification | CodeCode Available | 0 | 5 |
| Copy-Move Forgery Detection and Question Answering for Remote Sensing Image | Dec 3, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| MUREL: Multimodal Relational Reasoning for Visual Question Answering | Feb 25, 2019 | Relational ReasoningVisual Question Answering | CodeCode Available | 0 | 5 |
| Grad-CAM: Why did you say that? | Nov 22, 2016 | Image CaptioningVisual Question Answering | CodeCode Available | 0 | 5 |
| Convincing Rationales for Visual Question Answering Reasoning | Feb 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Counting Everyday Objects in Everyday Scenes | Apr 12, 2016 | ObjectObject Counting | CodeCode Available | 0 | 5 |
| Multimodal Residual Learning for Visual QA | Jun 5, 2016 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| Continual VQA for Disaster Response Systems | Sep 21, 2022 | Disaster ResponseManagement | CodeCode Available | 0 | 5 |
| Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering | Jul 28, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach | Jan 31, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA | Mar 17, 2021 | Question AnsweringRelational Reasoning | CodeCode Available | 0 | 5 |
| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 | 5 |
| Contextual Dropout: An Efficient Sample-Dependent Dropout Module | Mar 6, 2021 | image-classificationImage Classification | CodeCode Available | 0 | 5 |
| Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond | Oct 8, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Attribute Diversity Determines the Systematicity Gap in VQA | Nov 15, 2023 | AttributeDiagnostic | CodeCode Available | 0 | 5 |
| Consistency of Compositional Generalization across Multiple Levels | Dec 18, 2024 | Meta-LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering | Aug 4, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding | Jun 6, 2016 | Phrase GroundingVisual Grounding | CodeCode Available | 0 | 5 |
| Multimodal Explanations: Justifying Decisions and Pointing to the Evidence | Feb 15, 2018 | Activity RecognitionExplainable Models | CodeCode Available | 0 | 5 |
| Adaptive loose optimization for robust question answering | May 6, 2023 | Extractive Question-AnsweringMachine Reading Comprehension | CodeCode Available | 0 | 5 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 | 5 |
| Attention on Attention: Architectures for Visual Question Answering (VQA) | Mar 21, 2018 | GPUQuestion Answering | CodeCode Available | 0 | 5 |