| Multimodal Graph Networks for Compositional Generalization in Visual Question Answering | Dec 1, 2020 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| Point and Ask: Incorporating Pointing into Visual Question Answering | Nov 27, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Learning from Lexical Perturbations for Consistent Visual Question Answering | Nov 26, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Siamese Tracking with Lingual Object Constraints | Nov 23, 2020 | ObjectObject Tracking | CodeCode Available | 0 |
| Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention | Nov 23, 2020 | ClassificationGeneral Classification | CodeCode Available | 1 |
| Modular Graph Attention Network for Complex Visual Relational Reasoning | Nov 22, 2020 | Graph AttentionQuestion Answering | —Unverified | 0 |
| LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering | Nov 21, 2020 | Answer GenerationQuestion Answering | CodeCode Available | 1 |
| Logically Consistent Loss for Visual Question Answering | Nov 19, 2020 | Multi-Task LearningQuestion Answering | —Unverified | 0 |
| Generating Natural Questions from Images for Multimodal Assistants | Nov 17, 2020 | Common Sense ReasoningNatural Questions | —Unverified | 0 |
| CapWAP: Captioning with a Purpose | Nov 9, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles | Nov 7, 2020 | Natural Language InferenceQuestion Answering | CodeCode Available | 0 |
| Disentangling 3D Prototypical Networks For Few-Shot Concept Learning | Nov 6, 2020 | 3D geometry3D Object Detection | CodeCode Available | 1 |
| An Improved Attention for Visual Question Answering | Nov 4, 2020 | DecoderQuestion Answering | CodeCode Available | 0 |
| Reasoning Over History: Context Aware Visual Dialog | Nov 2, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Can Pre-training help VQA with Lexical Variations? | Nov 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| ConceptBert: Concept-Aware Representation for Visual Question Answering | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 |
| CapWAP: Image Captioning with a Purpose | Nov 1, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 |
| ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Nov 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering | Nov 1, 2020 | Contrastive Learningcounterfactual | CodeCode Available | 1 |
| Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View | Oct 30, 2020 | Face Recognitionimage-classification | CodeCode Available | 0 |
| Leveraging Visual Question Answering to Improve Text-to-Image Synthesis | Oct 28, 2020 | Auxiliary LearningImage Generation | —Unverified | 0 |
| MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering | Oct 27, 2020 | DiagnosticQuestion Answering | CodeCode Available | 1 |
| RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering | Oct 24, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions | Oct 24, 2020 | General ClassificationMultiple-choice | —Unverified | 0 |