| Counting Everyday Objects in Everyday Scenes | Apr 12, 2016 | ObjectObject Counting | CodeCode Available | 0 | 5 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Feb 4, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering | Dec 20, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 | 5 |
| OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics | Feb 21, 2022 | BIG-bench Machine LearningGraph Generation | CodeCode Available | 0 | 5 |
| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 | 5 |
| AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Oct 28, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 | 5 |
| Adaptively Clustering Neighbor Elements for Image-Text Generation | Jan 5, 2023 | ClusteringDecoder | CodeCode Available | 0 | 5 |
| Object Attribute Matters in Visual Question Answering | Dec 20, 2023 | AttributeGraph Neural Network | CodeCode Available | 0 | 5 |
| No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory | Feb 6, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning | Mar 6, 2020 | Density EstimationNoise Estimation | CodeCode Available | 0 | 5 |
| A Unified Hallucination Mitigation Framework for Large Vision-Language Models | Sep 24, 2024 | HallucinationQuestion Answering | CodeCode Available | 0 | 5 |
| Core Tokensets for Data-efficient Sequential Training of Transformers | Oct 8, 2024 | Image Captioningimage-classification | CodeCode Available | 0 | 5 |
| Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding | Oct 4, 2018 | Question AnsweringRepresentation Learning | CodeCode Available | 0 | 5 |
| Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA | Mar 17, 2021 | Question AnsweringRelational Reasoning | CodeCode Available | 0 | 5 |
| Copy-Move Forgery Detection and Question Answering for Remote Sensing Image | Dec 3, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 | 5 |
| Neural Module Networks | Nov 9, 2015 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Grad-CAM: Why did you say that? | Nov 22, 2016 | Image CaptioningVisual Question Answering | CodeCode Available | 0 | 5 |
| Convincing Rationales for Visual Question Answering Reasoning | Feb 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Continual VQA for Disaster Response Systems | Sep 21, 2022 | Disaster ResponseManagement | CodeCode Available | 0 | 5 |
| Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering | Jul 28, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach | Jan 31, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Open-Set Knowledge-Based Visual Question Answering with Inference Paths | Oct 12, 2023 | Knowledge GraphsMulti-class Classification | CodeCode Available | 0 | 5 |
| Contextual Dropout: An Efficient Sample-Dependent Dropout Module | Mar 6, 2021 | image-classificationImage Classification | CodeCode Available | 0 | 5 |
| Attribute Diversity Determines the Systematicity Gap in VQA | Nov 15, 2023 | AttributeDiagnostic | CodeCode Available | 0 | 5 |
| Consistency of Compositional Generalization across Multiple Levels | Dec 18, 2024 | Meta-LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Multi-Sourced Compositional Generalization in Visual Question Answering | May 29, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| MUREL: Multimodal Relational Reasoning for Visual Question Answering | Feb 25, 2019 | Relational ReasoningVisual Question Answering | CodeCode Available | 0 | 5 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Hierarchical Deep Multi-modal Network for Medical Visual Question Answering | Sep 27, 2020 | DescriptiveMedical Visual Question Answering | CodeCode Available | 0 | 5 |
| Adaptive loose optimization for robust question answering | May 6, 2023 | Extractive Question-AnsweringMachine Reading Comprehension | CodeCode Available | 0 | 5 |
| Multimodal Residual Learning for Visual QA | Jun 5, 2016 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Attention on Attention: Architectures for Visual Question Answering (VQA) | Mar 21, 2018 | GPUQuestion Answering | CodeCode Available | 0 | 5 |
| Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering | Oct 26, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond | Oct 8, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Adapting Lightweight Vision Language Models for Radiological Visual Question Answering | Jun 17, 2025 | DiagnosticQuestion Answering | CodeCode Available | 0 | 5 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 | 5 |
| Compositionality as Lexical Symmetry | Jan 30, 2022 | Data AugmentationInductive Bias | CodeCode Available | 0 | 5 |
| Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding | Jun 6, 2016 | Phrase GroundingVisual Grounding | CodeCode Available | 0 | 5 |
| Compositional Image-Text Matching and Retrieval by Grounding Entities | May 4, 2025 | Image CaptioningImage-text matching | CodeCode Available | 0 | 5 |
| Multimodal Explanations: Justifying Decisions and Pointing to the Evidence | Feb 15, 2018 | Activity RecognitionExplainable Models | CodeCode Available | 0 | 5 |
| Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model | Jan 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Targeted Visual Prompting for Medical Visual Question Answering | Aug 6, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Language Models Meet Anomaly Detection for Better Interpretability and Generalizability | Apr 11, 2024 | Anomaly DetectionLanguage Modelling | CodeCode Available | 0 | 5 |
| Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering | Aug 4, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| MUTAN: Multimodal Tucker Fusion for Visual Question Answering | May 18, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese | May 7, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 0 | 5 |