| 'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks | Mar 28, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models | Mar 26, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual Grounding Strategies for Text-Only Natural Language Processing | Mar 25, 2021 | Image RetrievalLanguage Modeling | —Unverified | 0 |
| Multi-Modal Answer Validation for Knowledge-Based VQA | Mar 23, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| How to Design Sample and Computationally Efficient VQA Models | Mar 22, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| A Comprehensive Survey of Scene Graphs: Generation and Application | Mar 17, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA | Mar 17, 2021 | Question AnsweringRelational Reasoning | CodeCode Available | 0 |
| Characterizing Misclassifications of Deep NLP Models | Mar 12, 2021 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| RL-CSDia: Representation Learning of Computer Science Diagrams | Mar 10, 2021 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering | Mar 9, 2021 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 0 |
| Contextual Dropout: An Efficient Sample-Dependent Dropout Module | Mar 6, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Visual Question Answering: which investigated applications? | Mar 4, 2021 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues | Mar 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Learning Compositional Representation for Few-shot Visual Question Answering | Feb 21, 2021 | AttributeQuestion Answering | —Unverified | 0 |
| Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | Feb 18, 2021 | DecoderDocument Image Classification | CodeCode Available | 1 |
| SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering | Feb 18, 2021 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |
| Unifying Vision-and-Language Tasks via Text Generation | Feb 4, 2021 | Conditional Text GenerationDecoder | CodeCode Available | 1 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games | Jan 31, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VisualMRC: Machine Reading Comprehension on Document Images | Jan 27, 2021 | Machine Reading ComprehensionNatural Language Understanding | CodeCode Available | 1 |
| Unanswerable Questions about Images and Texts | Jan 25, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Understanding in Artificial Intelligence | Jan 17, 2021 | Natural Language UnderstandingQuestion Answering | —Unverified | 0 |
| Latent Variable Models for Visual Question Answering | Jan 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Understanding the Role of Scene Graphs in Visual Question Answering | Jan 14, 2021 | Graph GenerationQuestion Answering | —Unverified | 0 |
| Predicting Relative Depth between Objects from Semantic Features | Jan 12, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Self Supervision for Attention Networks | Jan 6, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Transformers in Vision: A Survey | Jan 4, 2021 | Action RecognitionActivity Recognition | —Unverified | 0 |
| Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning | Jan 1, 2021 | Graph AttentionImage Captioning | —Unverified | 0 |
| Unshuffling Data for Improved Generalization in Visual Question Answering | Jan 1, 2021 | Out-of-Distribution GeneralizationQuestion Answering | —Unverified | 0 |
| MDETR - Modulated Detection for End-to-End Multi-Modal Understanding | Jan 1, 2021 | Phrase GroundingQuestion Answering | CodeCode Available | 2 |
| Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering | Jan 1, 2021 | Novel ConceptsQuestion Answering | —Unverified | 0 |
| Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos | Jan 1, 2021 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| TRAR: Routing the Attention Spans in Transformer for Visual Question Answering | Jan 1, 2021 | Question AnsweringReferring Expression | CodeCode Available | 1 |
| Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images | Jan 1, 2021 | AttributeMultiple Instance Learning | CodeCode Available | 1 |
| Differentiable End-to-End Program Executor for Sample and Computationally Efficient VQA | Jan 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings | Dec 31, 2020 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 |
| Detecting Hate Speech in Multi-modal Memes | Dec 29, 2020 | Binary ClassificationHate Speech Detection | CodeCode Available | 1 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 |
| Learning content and context with language bias for Visual Question Answering | Dec 21, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Object-Centric Diagnosis of Visual Reasoning | Dec 21, 2020 | DiagnosticObject | —Unverified | 0 |
| Overcoming Language Priors with Self-supervised Learning for Visual Question Answering | Dec 17, 2020 | Question AnsweringSelf-Supervised Learning | CodeCode Available | 1 |
| Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding | Dec 14, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | —Unverified | 0 |
| FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding | Dec 5, 2020 | image-classificationImage Classification | CodeCode Available | 1 |
| WeaQA: Weak Supervision via Captions for Visual Question Answering | Dec 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards Knowledge-Augmented Visual Question Answering | Dec 1, 2020 | General KnowledgeGraph Attention | CodeCode Available | 0 |
| A Unified Framework for Multilingual and Code-Mixed Visual Question Answering | Dec 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Just Ask: Learning to Answer Questions from Millions of Narrated Videos | Dec 1, 2020 | Question AnsweringQuestion Generation | CodeCode Available | 1 |