| Pragmatic Issue-Sensitive Image Captioning | Apr 29, 2020 | DescriptiveImage Captioning | CodeCode Available | 0 |
| Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data | Apr 7, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning Visual Question Answering by Bootstrapping Hard Attention | Aug 1, 2018 | Hard AttentionQuestion Answering | CodeCode Available | 0 |
| Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory | Jul 4, 2021 | Question AnsweringScene Understanding | CodeCode Available | 0 |
| Learning to Reason: End-to-End Module Networks for Visual Question Answering | Apr 18, 2017 | Visual DialogVisual Question Answering | CodeCode Available | 0 |
| Black-box Model Ensembling for Textual and Visual Question Answering via Information Fusion | Jul 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| End-to-End Instance Segmentation with Recurrent Attention | May 30, 2016 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering | Jun 28, 2023 | Passage RetrievalQuestion Answering | CodeCode Available | 0 |
| Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays | Feb 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles | Nov 7, 2020 | Natural Language InferenceQuestion Answering | CodeCode Available | 0 |
| Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | Apr 11, 2024 | DescriptiveHallucination | CodeCode Available | 0 |
| End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features | Jun 21, 2018 | Question AnsweringVideo Description | CodeCode Available | 0 |
| Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models | May 8, 2025 | Active Learningcross-modal alignment | CodeCode Available | 0 |
| Co-attending Regions and Detections with Multi-modal Multiplicative Embedding for VQA | Nov 18, 2017 | FormQuestion Answering | CodeCode Available | 0 |
| Learning to Follow Object-Centric Image Editing Instructions Faithfully | Oct 29, 2023 | ObjectQuestion Answering | CodeCode Available | 0 |
| Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering | Nov 18, 2017 | FormVisual Question Answering | CodeCode Available | 0 |
| Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach | Jan 31, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| VinVL+L: Enriching Visual Representation with Location Context in VQA | Feb 22, 2023 | Question AnsweringTAG | CodeCode Available | 0 |
| CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering | Aug 21, 2024 | Continual LearningQuestion Answering | CodeCode Available | 0 |
| TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering | Apr 14, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning to Count Objects in Natural Images for Visual Question Answering | Feb 15, 2018 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures | Jul 8, 2017 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 0 |
| EaSe: A Diagnostic Tool for VQA based on Answer Diversity | Jun 1, 2021 | DiagnosticDiversity | CodeCode Available | 0 |
| Learning the meanings of function words from grounded language using a visual question answering model | Aug 16, 2023 | Logical ReasoningQuestion Answering | CodeCode Available | 0 |
| Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models | Mar 22, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |