| On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering | Feb 24, 2020 | Question AnsweringReferring Expression | —Unverified | 0 |
| VQA-LOL: Visual Question Answering under the Lens of Logic | Feb 19, 2020 | NegationQuestion Answering | —Unverified | 0 |
| CQ-VQA: Visual Question Answering on Categorized Questions | Feb 17, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Sparse and Structured Visual Attention | Feb 13, 2020 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Component Analysis for Visual Question Answering Architectures | Feb 12, 2020 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach | Jan 31, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Robust Explanations for Visual Question Answering | Jan 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Uncertainty based Class Activation Maps for Visual Question Answering | Jan 23, 2020 | Deep LearningProbabilistic Deep Learning | —Unverified | 0 |
| Recommending Themes for Ad Creative Design via Visual-Linguistic Representations | Jan 20, 2020 | Question AnsweringRecommendation Systems | CodeCode Available | 0 |
| Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models | Jan 20, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | Jan 11, 2020 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 |
| Visual Question Answering on 360° Images | Jan 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering | Jan 3, 2020 | Question AnsweringVideo Description | —Unverified | 0 |
| Vision and Language: from Visual Perception to Content Creation | Dec 26, 2019 | DecoderQuestion Answering | —Unverified | 0 |
| Deep Exemplar Networks for VQA and VQG | Dec 19, 2019 | DecoderQuestion Answering | —Unverified | 0 |
| Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing | Dec 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| AI2D-RST: A multimodal corpus of 1000 primary school science diagrams | Dec 9, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks | Dec 6, 2019 | Image RetrievalInductive Bias | —Unverified | 0 |
| 12-in-1: Multi-Task Vision and Language Representation Learning | Dec 5, 2019 | 10-shot image generationImage Retrieval | CodeCode Available | 0 |
| Deep Bayesian Active Learning for Multiple Correct Outputs | Dec 2, 2019 | Active LearningAnswer Generation | —Unverified | 0 |
| TAB-VCR: Tags and Attributes based VCR Baselines | Dec 1, 2019 | AttributeQuestion Answering | CodeCode Available | 0 |
| RUBi: Reducing Unimodal Biases for Visual Question Answering | Dec 1, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Assessing the Robustness of Visual Question Answering Models | Nov 30, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop | Nov 30, 2019 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Unsupervised Keyword Extraction for Full-sentence VQA | Nov 23, 2019 | Keyword ExtractionQuestion Answering | —Unverified | 0 |
| Temporal Reasoning via Audio Question Answering | Nov 21, 2019 | Audio Question AnsweringDiagnostic | CodeCode Available | 0 |
| Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA | Nov 19, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | Nov 17, 2019 | feature selectionQuestion Answering | CodeCode Available | 0 |
| Question-Conditioned Counterfactual Image Generation for VQA | Nov 14, 2019 | counterfactualImage Generation | —Unverified | 0 |
| Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation | Nov 11, 2019 | Domain AdaptationQuestion Answering | —Unverified | 0 |
| Multimodal Intelligence: Representation Learning, Information Fusion, and Applications | Nov 10, 2019 | Caption GenerationImage Generation | —Unverified | 0 |
| Are we asking the right questions in MovieQA? | Nov 8, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Representing Movie Characters in Dialogues | Nov 1, 2019 | Question AnsweringRelation Classification | —Unverified | 0 |
| YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension | Nov 1, 2019 | Caption GenerationQuestion Answering | —Unverified | 0 |
| TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines | Oct 31, 2019 | AttributeQuestion Answering | CodeCode Available | 0 |
| Learning Rich Image Region Representation for Visual Question Answering | Oct 29, 2019 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 |
| Enforcing Reasoning in Visual Commonsense Reasoning | Oct 21, 2019 | Question AnsweringReinforcement Learning | —Unverified | 0 |
| Neural Memory Plasticity for Anomaly Detection | Oct 12, 2019 | Anomaly DetectionEEG | —Unverified | 0 |
| Multi-modal Deep Analysis for Multimedia | Oct 11, 2019 | Multi-modal RecommendationQuestion Answering | —Unverified | 0 |
| Modulated Self-attention Convolutional Network for VQA | Oct 8, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| REMIND Your Neural Network to Prevent Catastrophic Forgetting | Oct 6, 2019 | QuantizationQuestion Answering | CodeCode Available | 0 |
| SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering | Oct 1, 2019 | Embodied Question AnsweringQuestion Answering | —Unverified | 0 |
| From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason | Oct 1, 2019 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints | Sep 30, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Compact Trilinear Interaction for Visual Question Answering | Sep 26, 2019 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 |
| Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference | Sep 25, 2019 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints | Sep 25, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Learning to Recognize the Unseen Visual Predicates | Sep 25, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 |