| SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering | Feb 18, 2021 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |
| Unifying Vision-and-Language Tasks via Text Generation | Feb 4, 2021 | Conditional Text GenerationDecoder | CodeCode Available | 1 |
| VisualMRC: Machine Reading Comprehension on Document Images | Jan 27, 2021 | Machine Reading ComprehensionNatural Language Understanding | CodeCode Available | 1 |
| TRAR: Routing the Attention Spans in Transformer for Visual Question Answering | Jan 1, 2021 | Question AnsweringReferring Expression | CodeCode Available | 1 |
| Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos | Jan 1, 2021 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images | Jan 1, 2021 | AttributeMultiple Instance Learning | CodeCode Available | 1 |
| Detecting Hate Speech in Multi-modal Memes | Dec 29, 2020 | Binary ClassificationHate Speech Detection | CodeCode Available | 1 |
| Overcoming Language Priors with Self-supervised Learning for Visual Question Answering | Dec 17, 2020 | Question AnsweringSelf-Supervised Learning | CodeCode Available | 1 |
| Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding | Dec 14, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding | Dec 5, 2020 | image-classificationImage Classification | CodeCode Available | 1 |
| Just Ask: Learning to Answer Questions from Millions of Narrated Videos | Dec 1, 2020 | Question AnsweringQuestion Generation | CodeCode Available | 1 |
| Point and Ask: Incorporating Pointing into Visual Question Answering | Nov 27, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention | Nov 23, 2020 | ClassificationGeneral Classification | CodeCode Available | 1 |
| LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering | Nov 21, 2020 | Answer GenerationQuestion Answering | CodeCode Available | 1 |
| Disentangling 3D Prototypical Networks For Few-Shot Concept Learning | Nov 6, 2020 | 3D geometry3D Object Detection | CodeCode Available | 1 |
| Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering | Nov 1, 2020 | Contrastive Learningcounterfactual | CodeCode Available | 1 |
| ConceptBert: Concept-Aware Representation for Visual Question Answering | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 |
| MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering | Oct 27, 2020 | DiagnosticQuestion Answering | CodeCode Available | 1 |
| RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering | Oct 24, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies | Oct 21, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Bayesian Attention Modules | Oct 20, 2020 | Image CaptioningMachine Translation | CodeCode Available | 1 |
| Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs | Oct 15, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Contrast and Classify: Training Robust VQA Models | Oct 13, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers | Sep 23, 2020 | Image CaptioningImage Generation | CodeCode Available | 1 |
| MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering | Sep 18, 2020 | Out-of-Distribution GeneralizationQuestion Answering | CodeCode Available | 1 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 |
| A Dataset and Baselines for Visual Question Answering on Art | Aug 28, 2020 | Question AnsweringQuestion Generation | CodeCode Available | 1 |
| DeVLBert: Learning Deconfounded Visio-Linguistic Representations | Aug 16, 2020 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering | Jul 19, 2020 | Adversarial AttackData Augmentation | CodeCode Available | 1 |
| Learning to Discretely Compose Reasoning Module Networks for Video Captioning | Jul 17, 2020 | DecoderQuestion Answering | CodeCode Available | 1 |
| DocVQA: A Dataset for VQA on Document Images | Jul 1, 2020 | Question AnsweringReading Comprehension | CodeCode Available | 1 |
| Ontology-guided Semantic Composition for Zero-Shot Learning | Jun 30, 2020 | image-classificationImage Classification | CodeCode Available | 1 |
| Graph Optimal Transport for Cross-Domain Alignment | Jun 26, 2020 | Graph MatchingImage Captioning | CodeCode Available | 1 |
| Sparse and Continuous Attention Mechanisms | Jun 12, 2020 | Machine TranslationQuestion Answering | CodeCode Available | 1 |
| Large-Scale Adversarial Training for Vision-and-Language Representation Learning | Jun 11, 2020 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning | Jun 11, 2020 | Question AnsweringReinforcement Learning (RL) | CodeCode Available | 1 |
| Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To? | Jun 9, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Attention-Based Context Aware Reasoning for Situation Recognition | Jun 1, 2020 | Action RecognitionFine-grained Action Recognition | CodeCode Available | 1 |
| Cross-Modality Relevance for Reasoning on Language and Vision | May 12, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| COBRA: Contrastive Bi-Modal Representation Algorithm | May 7, 2020 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 1 |
| Dynamic Language Binding in Relational Visual Reasoning | Apr 30, 2020 | ObjectQuestion Answering | CodeCode Available | 1 |
| Deep Multimodal Neural Architecture Search | Apr 25, 2020 | DecoderImage-text matching | CodeCode Available | 1 |
| Visual Grounding Methods for VQA are Working for the Wrong Reasons! | Apr 12, 2020 | Question AnsweringVisual Grounding | CodeCode Available | 1 |
| Evaluating Multimodal Representations on Visual Semantic Textual Similarity | Apr 4, 2020 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers | Apr 2, 2020 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| X-Linear Attention Networks for Image Captioning | Mar 31, 2020 | DecoderFine-Grained Visual Recognition | CodeCode Available | 1 |
| Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI | Mar 16, 2020 | BenchmarkingExplainable Artificial Intelligence (XAI) | CodeCode Available | 1 |
| Counterfactual Samples Synthesizing for Robust Visual Question Answering | Mar 14, 2020 | counterfactualQuestion Answering | CodeCode Available | 1 |
| PathVQA: 30000+ Questions for Medical Visual Question Answering | Mar 7, 2020 | AI AgentMedical Visual Question Answering | CodeCode Available | 1 |