| Learning Representations of Sets through Optimized Permutations | Dec 10, 2018 | General ClassificationQuestion Answering | CodeCode Available | 0 |
| ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities | Nov 16, 2021 | ArticlesFace Recognition | CodeCode Available | 0 |
| ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images | Feb 9, 2025 | Clinical KnowledgeMedical Visual Question Answering | CodeCode Available | 0 |
| VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives | Jun 22, 2022 | Feature ImportanceQuestion Answering | CodeCode Available | 0 |
| Learning from Lexical Perturbations for Consistent Visual Question Answering | Nov 26, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems | Jun 27, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning Convolutional Text Representations for Visual Question Answering | May 18, 2017 | General Classificationimage-classification | CodeCode Available | 0 |
| Attribute Diversity Determines the Systematicity Gap in VQA | Nov 15, 2023 | AttributeDiagnostic | CodeCode Available | 0 |
| What value do explicit high level concepts have in vision to language problems? | Jun 3, 2015 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | Jan 3, 2019 | DiagnosticImage Segmentation | CodeCode Available | 0 |
| Learning content and context with language bias for Visual Question Answering | Dec 21, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision | Apr 26, 2019 | Image-text RetrievalObject | CodeCode Available | 0 |
| The Promise of Premise: Harnessing Question Premises in Visual Question Answering | May 1, 2017 | Question AnsweringRelevance Detection | CodeCode Available | 0 |
| Attention on Attention: Architectures for Visual Question Answering (VQA) | Mar 21, 2018 | GPUQuestion Answering | CodeCode Available | 0 |
| Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery | Oct 29, 2023 | Deep LearningMultimodal Deep Learning | CodeCode Available | 0 |
| Ask Your Neurons: A Deep Learning Approach to Visual Question Answering | May 9, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning Conditioned Graph Structures for Interpretable Visual Question Answering | Jun 19, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models | Apr 15, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning | Apr 1, 2024 | Image CaptioningInstruction Following | CodeCode Available | 0 |
| VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers | Mar 30, 2022 | Question AnsweringVisual Commonsense Reasoning | CodeCode Available | 0 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining | May 29, 2025 | Question AnsweringRepresentation Learning | CodeCode Available | 0 |
| Quantifying and Alleviating the Language Prior Problem in Visual Question Answering | May 13, 2019 | Information RetrievalQuestion Answering | CodeCode Available | 0 |
| Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Nov 17, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 |