| Temporal Reasoning via Audio Question Answering | Nov 21, 2019 | Audio Question AnsweringDiagnostic | CodeCode Available | 0 |
| Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA | Nov 19, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | Nov 17, 2019 | feature selectionQuestion Answering | CodeCode Available | 0 |
| Question-Conditioned Counterfactual Image Generation for VQA | Nov 14, 2019 | counterfactualImage Generation | —Unverified | 0 |
| Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation | Nov 11, 2019 | Domain AdaptationQuestion Answering | —Unverified | 0 |
| Multimodal Intelligence: Representation Learning, Information Fusion, and Applications | Nov 10, 2019 | Caption GenerationImage Generation | —Unverified | 0 |
| Are we asking the right questions in MovieQA? | Nov 8, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Representing Movie Characters in Dialogues | Nov 1, 2019 | Question AnsweringRelation Classification | —Unverified | 0 |
| YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension | Nov 1, 2019 | Caption GenerationQuestion Answering | —Unverified | 0 |
| TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines | Oct 31, 2019 | AttributeQuestion Answering | CodeCode Available | 0 |
| Learning Rich Image Region Representation for Visual Question Answering | Oct 29, 2019 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 |
| Enforcing Reasoning in Visual Commonsense Reasoning | Oct 21, 2019 | Question AnsweringReinforcement Learning | —Unverified | 0 |
| Neural Memory Plasticity for Anomaly Detection | Oct 12, 2019 | Anomaly DetectionEEG | —Unverified | 0 |
| Multi-modal Deep Analysis for Multimedia | Oct 11, 2019 | Multi-modal RecommendationQuestion Answering | —Unverified | 0 |
| Modulated Self-attention Convolutional Network for VQA | Oct 8, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| REMIND Your Neural Network to Prevent Catastrophic Forgetting | Oct 6, 2019 | QuantizationQuestion Answering | CodeCode Available | 0 |
| SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering | Oct 1, 2019 | Embodied Question AnsweringQuestion Answering | —Unverified | 0 |
| From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason | Oct 1, 2019 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints | Sep 30, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Compact Trilinear Interaction for Visual Question Answering | Sep 26, 2019 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 |
| Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference | Sep 25, 2019 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints | Sep 25, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Learning to Recognize the Unseen Visual Predicates | Sep 25, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 |