| Transformers in Vision: A Survey | Jan 4, 2021 | Action RecognitionActivity Recognition | —Unverified | 0 |
| Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering | Jan 1, 2021 | Novel ConceptsQuestion Answering | —Unverified | 0 |
| Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning | Jan 1, 2021 | Graph AttentionImage Captioning | —Unverified | 0 |
| Differentiable End-to-End Program Executor for Sample and Computationally Efficient VQA | Jan 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Unshuffling Data for Improved Generalization in Visual Question Answering | Jan 1, 2021 | Out-of-Distribution GeneralizationQuestion Answering | —Unverified | 0 |
| Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings | Dec 31, 2020 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 |
| Object-Centric Diagnosis of Visual Reasoning | Dec 21, 2020 | DiagnosticObject | —Unverified | 0 |
| Learning content and context with language bias for Visual Question Answering | Dec 21, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | —Unverified | 0 |
| WeaQA: Weak Supervision via Captions for Visual Question Answering | Dec 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards Knowledge-Augmented Visual Question Answering | Dec 1, 2020 | General KnowledgeGraph Attention | CodeCode Available | 0 |
| Multimodal Graph Networks for Compositional Generalization in Visual Question Answering | Dec 1, 2020 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| A Unified Framework for Multilingual and Code-Mixed Visual Question Answering | Dec 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Learning from Lexical Perturbations for Consistent Visual Question Answering | Nov 26, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Siamese Tracking with Lingual Object Constraints | Nov 23, 2020 | ObjectObject Tracking | CodeCode Available | 0 |
| Modular Graph Attention Network for Complex Visual Relational Reasoning | Nov 22, 2020 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Logically Consistent Loss for Visual Question Answering | Nov 19, 2020 | Multi-Task LearningQuestion Answering | —Unverified | 0 |
| Generating Natural Questions from Images for Multimodal Assistants | Nov 17, 2020 | Common Sense ReasoningNatural Questions | —Unverified | 0 |
| CapWAP: Captioning with a Purpose | Nov 9, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles | Nov 7, 2020 | Natural Language InferenceQuestion Answering | CodeCode Available | 0 |
| An Improved Attention for Visual Question Answering | Nov 4, 2020 | DecoderQuestion Answering | CodeCode Available | 0 |
| Reasoning Over History: Context Aware Visual Dialog | Nov 2, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Can Pre-training help VQA with Lexical Variations? | Nov 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| CapWAP: Image Captioning with a Purpose | Nov 1, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 |
| ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Nov 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View | Oct 30, 2020 | Face Recognitionimage-classification | CodeCode Available | 0 |
| Leveraging Visual Question Answering to Improve Text-to-Image Synthesis | Oct 28, 2020 | Auxiliary LearningImage Generation | —Unverified | 0 |
| Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions | Oct 24, 2020 | General ClassificationMultiple-choice | —Unverified | 0 |
| SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency | Oct 20, 2020 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering | Oct 17, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| New Ideas and Trends in Deep Multimodal Content Understanding: A Review | Oct 16, 2020 | Cross-Modal RetrievalDeep Learning | —Unverified | 0 |
| Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think! | Oct 13, 2020 | DiagnosticImage-text Classification | —Unverified | 0 |
| Interpretable Neural Computation for Real-World Compositional Visual Question Answering | Oct 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset | Oct 8, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Pathological Visual Question Answering | Oct 6, 2020 | AI AgentQuestion Answering | —Unverified | 0 |
| Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering | Oct 6, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Attention Guided Semantic Relationship Parsing for Visual Question Answering | Oct 5, 2020 | ObjectQuestion Answering | —Unverified | 0 |
| CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns | Oct 2, 2020 | Image Captioningobject-detection | —Unverified | 0 |
| ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Oct 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network | Sep 30, 2020 | Heuristic SearchQuestion Answering | —Unverified | 0 |
| Spatial Attention as an Interface for Image Captioning Models | Sep 29, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Hierarchical Deep Multi-modal Network for Medical Visual Question Answering | Sep 27, 2020 | DescriptiveMedical Visual Question Answering | CodeCode Available | 0 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Regularizing Attention Networks for Anomaly Detection in Visual Question Answering | Sep 21, 2020 | Anomaly DetectionQuestion Answering | —Unverified | 0 |
| A Multimodal Memes Classification: A Survey and Open Research Issues | Sep 17, 2020 | ClassificationGeneral Classification | —Unverified | 0 |
| Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering | Aug 31, 2020 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Visual Question Answering on Image Sets | Aug 27, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Document Visual Question Answering Challenge 2020 | Aug 20, 2020 | Question AnsweringRetrieval | —Unverified | 0 |