| Retrieving and Highlighting Action with Spatiotemporal Reference | May 19, 2020 | Action RecognitionCross-Modal Retrieval | —Unverified | 0 |
| Cross-Modality Relevance for Reasoning on Language and Vision | May 12, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Dynamic Language Binding in Relational Visual Reasoning | Apr 30, 2020 | ObjectQuestion Answering | CodeCode Available | 1 |
| Differentiable Adaptive Computation Time for Visual Reasoning | Apr 27, 2020 | Visual Reasoning | CodeCode Available | 1 |
| Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning | Apr 25, 2020 | Relational ReasoningVisual Reasoning | CodeCode Available | 1 |
| Five Points to Check when Comparing Visual Perception in Humans and Machines | Apr 20, 2020 | Decision MakingObject Recognition | CodeCode Available | 0 |
| SHOP-VRB: A Visual Reasoning Benchmark for Object Perception | Apr 6, 2020 | ObjectVisual Reasoning | —Unverified | 0 |
| Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers | Apr 2, 2020 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| TextCaps: a Dataset for Image Captioning with Reading Comprehension | Mar 24, 2020 | Image CaptioningOptical Character Recognition | —Unverified | 0 |
| Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data | Mar 3, 2020 | Robot ManipulationVisual Reasoning | —Unverified | 0 |
| Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension | Mar 1, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Weakly Supervised Visual Semantic Parsing | Jan 8, 2020 | Graph GenerationImage Retrieval | CodeCode Available | 1 |
| Smart Home Appliances: Chat with Your Fridge | Dec 19, 2019 | Dataset GenerationVisual Reasoning | CodeCode Available | 0 |
| Transfer Learning in Visual and Relational Reasoning | Nov 27, 2019 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| ChartNet: Visual Reasoning over Statistical Charts using MAC-Networks | Nov 21, 2019 | General ClassificationVisual Reasoning | —Unverified | 0 |
| Temporal Reasoning via Audio Question Answering | Nov 21, 2019 | Audio Question AnsweringDiagnostic | CodeCode Available | 0 |
| Modeling Gestalt Visual Reasoning on the Raven's Progressive Matrices Intelligence Test Using Generative Image Inpainting Techniques | Nov 18, 2019 | Image InpaintingVisual Reasoning | —Unverified | 0 |
| Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning Test | Nov 18, 2019 | Few-Shot LearningProgram Synthesis | CodeCode Available | 0 |
| Attention on Abstract Visual Reasoning | Nov 14, 2019 | Program inductionRelation | —Unverified | 0 |
| Big Generalizations with Small Data: Exploring the Role of Training Samples in Learning Adjectives of Size | Nov 1, 2019 | Small Data Image ClassificationVisual Reasoning | —Unverified | 0 |
| Meta Module Network for Compositional Visual Reasoning | Oct 8, 2019 | MORPHVisual Reasoning | CodeCode Available | 0 |
| Modulated Self-attention Convolutional Network for VQA | Oct 8, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Few-Shot Abstract Visual Reasoning With Spectral Features | Oct 4, 2019 | Few-Shot LearningVisual Reasoning | —Unverified | 0 |
| CLEVRER: CoLlision Events for Video REpresentation and Reasoning | Oct 3, 2019 | counterfactualDescriptive | CodeCode Available | 0 |
| UNITER: UNiversal Image-TExt Representation Learning | Sep 25, 2019 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Procedural Reasoning Networks for Understanding Multimodal Procedures | Sep 19, 2019 | Inductive BiasVisual Reasoning | —Unverified | 0 |
| Towards Explainable Neural-Symbolic Visual Reasoning | Sep 19, 2019 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| Dynamic Graph Attention for Referring Expression Comprehension | Sep 18, 2019 | Graph AttentionReferring Expression | —Unverified | 0 |
| Modelling Working Memory using Deep Recurrent Reinforcement Learning | Sep 11, 2019 | Decision Makingreinforcement-learning | —Unverified | 0 |
| Visual Semantic Reasoning for Image-Text Matching | Sep 6, 2019 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| LXMERT: Learning Cross-Modality Encoder Representations from Transformers | Aug 20, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| PHYRE: A New Benchmark for Physical Reasoning | Aug 15, 2019 | Visual Reasoning | CodeCode Available | 1 |
| VisualBERT: A Simple and Performant Baseline for Vision and Language | Aug 9, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks | Aug 6, 2019 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices | Jul 29, 2019 | Visual Reasoning | —Unverified | 0 |
| 2nd Place Solution to the GQA Challenge 2019 | Jul 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Learning by Abstraction: The Neural State Machine | Jul 9, 2019 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 0 |
| Learning to Compose and Reason with Language Tree Structures for Visual Grounding | Jun 5, 2019 | Visual GroundingVisual Reasoning | —Unverified | 0 |
| Iterative Search for Weakly Supervised Semantic Parsing | Jun 1, 2019 | Semantic ParsingVisual Reasoning | —Unverified | 0 |
| Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence | Jun 1, 2019 | Question AnsweringVisual Reasoning | —Unverified | 0 |
| It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning | Jun 1, 2019 | Transfer LearningVisual Reasoning | —Unverified | 0 |
| Are Disentangled Representations Helpful for Abstract Visual Reasoning? | May 29, 2019 | DisentanglementVisual Reasoning | —Unverified | 0 |
| Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning | May 28, 2019 | Visual Reasoning | CodeCode Available | 0 |
| Deep Reason: A Strong Baseline for Real-World Visual Reasoning | May 24, 2019 | Visual Reasoning | —Unverified | 0 |
| Learning to Collocate Neural Modules for Image Captioning | Apr 18, 2019 | DecoderImage Captioning | —Unverified | 0 |
| Question Guided Modular Routing Networks for Visual Question Answering | Apr 17, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| RAVEN: A Dataset for Relational and Analogical Visual rEasoNing | Mar 7, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 |
| From Visual to Acoustic Question Answering | Feb 28, 2019 | Acoustic Question AnsweringPosition | —Unverified | 0 |
| Differentiable Scene Graphs | Feb 26, 2019 | Visual Reasoning | CodeCode Available | 0 |
| GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering | Feb 25, 2019 | Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 1 |