| Multimodal fusion of imaging and genomics for lung cancer recurrence prediction | Feb 5, 2020 | Computed Tomography (CT)Question Answering | CodeCode Available | 1 |
| Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features | Jan 14, 2020 | ClassificationDiversity | CodeCode Available | 1 |
| In Defense of Grid Features for Visual Question Answering | Jan 10, 2020 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline | Dec 5, 2019 | Language ModellingRepresentation Learning | CodeCode Available | 1 |
| Overcoming Data Limitation in Medical Visual Question Answering | Sep 26, 2019 | DenoisingMedical Visual Question Answering | CodeCode Available | 1 |
| UNITER: UNiversal Image-TExt Representation Learning | Sep 25, 2019 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases | Sep 9, 2019 | Natural Language InferenceQuestion Answering | CodeCode Available | 1 |
| VL-BERT: Pre-training of Generic Visual-Linguistic Representations | Aug 22, 2019 | Image-text matchingLanguage Modelling | CodeCode Available | 1 |
| LXMERT: Learning Cross-Modality Encoder Representations from Transformers | Aug 20, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks | Aug 6, 2019 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge | May 31, 2019 | object-detectionObject Detection | CodeCode Available | 1 |
| Scene Text Visual Question Answering | May 31, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Gated Hierarchical Attention for Image Captioning | Oct 30, 2018 | DecoderImage Captioning | CodeCode Available | 1 |
| Faithful Multimodal Explanation for Visual Question Answering | Sep 8, 2018 | Explanatory Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering | May 24, 2018 | Question AnsweringRelation | CodeCode Available | 1 |
| AI2-THOR: An Interactive 3D Environment for Visual AI | Dec 14, 2017 | Deep Reinforcement LearningImitation Learning | CodeCode Available | 1 |
| Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments | Nov 20, 2017 | Reinforcement LearningTranslation | CodeCode Available | 1 |
| Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering | Jul 25, 2017 | Image CaptioningVisual Question Answering | CodeCode Available | 1 |
| Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning | Mar 20, 2017 | Deep Reinforcement Learningreinforcement-learning | CodeCode Available | 1 |
| CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning | Dec 20, 2016 | DiagnosticQuestion Answering | CodeCode Available | 1 |
| Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization | Oct 7, 2016 | General ClassificationImage Attribution | CodeCode Available | 1 |
| Hierarchical Question-Image Co-Attention for Visual Question Answering | May 31, 2016 | Visual DialogVisual Question Answering | CodeCode Available | 1 |
| VQA: Visual Question Answering | May 3, 2015 | Image CaptioningMultiple-choice | CodeCode Available | 1 |
| Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights | Jul 9, 2025 | DiagnosticMedical Visual Question Answering | —Unverified | 0 |
| LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation | Jul 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |