| Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies | Oct 21, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Bayesian Attention Modules | Oct 20, 2020 | Image CaptioningMachine Translation | CodeCode Available | 1 |
| SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency | Oct 20, 2020 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering | Oct 17, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| New Ideas and Trends in Deep Multimodal Content Understanding: A Review | Oct 16, 2020 | Cross-Modal RetrievalDeep Learning | —Unverified | 0 |
| Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs | Oct 15, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think! | Oct 13, 2020 | DiagnosticImage-text Classification | —Unverified | 0 |
| Contrast and Classify: Training Robust VQA Models | Oct 13, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| Interpretable Neural Computation for Real-World Compositional Visual Question Answering | Oct 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset | Oct 8, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Pathological Visual Question Answering | Oct 6, 2020 | AI AgentQuestion Answering | —Unverified | 0 |
| Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering | Oct 6, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Attention Guided Semantic Relationship Parsing for Visual Question Answering | Oct 5, 2020 | ObjectQuestion Answering | —Unverified | 0 |
| CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns | Oct 2, 2020 | Image Captioningobject-detection | —Unverified | 0 |
| ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Oct 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network | Sep 30, 2020 | Heuristic SearchQuestion Answering | —Unverified | 0 |
| Spatial Attention as an Interface for Image Captioning Models | Sep 29, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Hierarchical Deep Multi-modal Network for Medical Visual Question Answering | Sep 27, 2020 | DescriptiveMedical Visual Question Answering | CodeCode Available | 0 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers | Sep 23, 2020 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Regularizing Attention Networks for Anomaly Detection in Visual Question Answering | Sep 21, 2020 | Anomaly DetectionQuestion Answering | —Unverified | 0 |
| MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering | Sep 18, 2020 | Out-of-Distribution GeneralizationQuestion Answering | CodeCode Available | 1 |
| A Multimodal Memes Classification: A Survey and Open Research Issues | Sep 17, 2020 | ClassificationGeneral Classification | —Unverified | 0 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 |
| Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering | Aug 31, 2020 | Knowledge GraphsQuestion Answering | —Unverified | 0 |