| Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances | Sep 18, 2022 | AttributeQuestion Answering | CodeCode Available | 0 | 5 |
| Answering Diverse Questions via Text Attached with Key Audio-Visual Clues | Mar 11, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 | 5 |
| BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA | Mar 4, 2025 | Medical DiagnosisQuestion Answering | CodeCode Available | 0 | 5 |
| Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering | Dec 11, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 | 5 |
| Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering | Sep 13, 2021 | Data AugmentationQuestion Answering | CodeCode Available | 0 | 5 |
| BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models | Jan 28, 2023 | Out-of-Distribution GeneralizationQuestion Answering | CodeCode Available | 0 | 5 |
| Open-Set Knowledge-Based Visual Question Answering with Inference Paths | Oct 12, 2023 | Knowledge GraphsMulti-class Classification | CodeCode Available | 0 | 5 |
| OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese | May 7, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 0 | 5 |
| Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following | Jun 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| OsmLocator: locating overlapping scatter marks with a non-training generative perspective | Dec 18, 2023 | ClusteringCombinatorial Optimization | CodeCode Available | 0 | 5 |
| Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos | Jun 11, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 | 5 |
| Differential Attention for Visual Question Answering | Apr 1, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 | 5 |
| Did the Model Understand the Question? | May 14, 2018 | modelQuestion Answering | CodeCode Available | 0 | 5 |
| On Modality Bias Recognition and Reduction | Feb 25, 2022 | Action RecognitionMulti-modal Classification | CodeCode Available | 0 | 5 |
| Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference | Feb 25, 2025 | Question AnsweringRAG | CodeCode Available | 0 | 5 |
| Open-Ended Visual Question-Answering | Oct 9, 2016 | Question AnsweringSentence | CodeCode Available | 0 | 5 |
| HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation | May 16, 2025 | BenchmarkingEthics | CodeCode Available | 0 | 5 |
| Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Jun 15, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 0 | 5 |
| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 | 5 |
| Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering | Dec 20, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 | 5 |
| OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics | Feb 21, 2022 | BIG-bench Machine LearningGraph Generation | CodeCode Available | 0 | 5 |
| Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training | Mar 30, 2024 | Contrastive LearningQuestion Answering | CodeCode Available | 0 | 5 |
| OmniNet: A unified architecture for multi-modal multi-task learning | Jul 17, 2019 | Image CaptioningMulti-Task Learning | CodeCode Available | 0 | 5 |
| P NP, at least in Visual Question Answering | Mar 26, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models | Mar 22, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| An Improved Attention for Visual Question Answering | Nov 4, 2020 | DecoderQuestion Answering | CodeCode Available | 0 | 5 |
| Delving Deeper into Cross-lingual Visual Question Answering | Feb 15, 2022 | Inductive BiasQuestion Answering | CodeCode Available | 0 | 5 |
| Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding | Oct 4, 2018 | Question AnsweringRepresentation Learning | CodeCode Available | 0 | 5 |
| No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory | Feb 6, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering | Aug 10, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning | Mar 6, 2020 | Density EstimationNoise Estimation | CodeCode Available | 0 | 5 |
| Deep Modular Co-Attention Networks for Visual Question Answering | Jun 25, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking | Oct 11, 2021 | BenchmarkingQuestion Answering | CodeCode Available | 0 | 5 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| A Neuro-Symbolic ASP Pipeline for Visual Question Answering | May 16, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 | 5 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis | Aug 10, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MUTAN: Multimodal Tucker Fusion for Visual Question Answering | May 18, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets | Oct 12, 2024 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 | 5 |
| MUREL: Multimodal Relational Reasoning for Visual Question Answering | Feb 25, 2019 | Relational ReasoningVisual Question Answering | CodeCode Available | 0 | 5 |
| Neural Module Networks | Nov 9, 2015 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning | Jun 11, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 | 5 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering | May 21, 2024 | DiversityInformation Retrieval | CodeCode Available | 0 | 5 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Multi-Sourced Compositional Generalization in Visual Question Answering | May 29, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 | 5 |