| Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads | Apr 30, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Comparing Visual Reasoning in Humans and AI | Apr 29, 2021 | SentenceVisual Reasoning | —Unverified | 0 |
| Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning | Apr 7, 2021 | Representation LearningRetrieval | CodeCode Available | 1 |
| `Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks | Apr 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning | Mar 30, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning | Mar 30, 2021 | counterfactualObject | —Unverified | 0 |
| 'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks | Mar 28, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| ACRE: Abstract Causal REasoning Beyond Covariation | Mar 26, 2021 | BlockingCausal Discovery | —Unverified | 0 |
| Raven's Progressive Matrices Completion with Latent Gaussian Process Priors | Mar 22, 2021 | Answer SelectionGaussian Processes | CodeCode Available | 0 |
| Data augmentation by morphological mixup for solving Raven's Progressive Matrices | Mar 9, 2021 | Data AugmentationVisual Reasoning | —Unverified | 0 |
| Learning Transferable Visual Models From Natural Language Supervision | Feb 26, 2021 | Action RecognitionBenchmarking | CodeCode Available | 2 |
| UniT: Multimodal Multitask Learning with a Unified Transformer | Feb 22, 2021 | DecoderMultimodal Reasoning | CodeCode Available | 0 |
| Physical Reasoning Using Dynamics-Aware Models | Feb 20, 2021 | Visual Reasoning | CodeCode Available | 0 |
| Improving Scene Graph Classification by Exploiting Knowledge from Texts | Feb 9, 2021 | ClassificationGeneral Classification | —Unverified | 0 |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | Feb 5, 2021 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge | Jan 15, 2021 | Question AnsweringVisual Question Answering (VQA) | —Unverified | 0 |
| Transformers in Vision: A Survey | Jan 4, 2021 | Action RecognitionActivity Recognition | —Unverified | 0 |
| VinVL: Revisiting Visual Representations in Vision-Language Models | Jan 2, 2021 | Image CaptioningImage-text matching | CodeCode Available | 2 |
| DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue | Jan 1, 2021 | DiagnosticObject Tracking | CodeCode Available | 1 |
| Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning | Jan 1, 2021 | counterfactualObject | —Unverified | 0 |
| Object-Centric Diagnosis of Visual Reasoning | Dec 21, 2020 | DiagnosticObject | —Unverified | 0 |
| Attention over learned object embeddings enables complex visual reasoning | Dec 15, 2020 | ObjectVideo Object Tracking | —Unverified | 0 |
| Odd-One-Out Representation Learning | Dec 14, 2020 | DisentanglementMetric Learning | CodeCode Available | 0 |
| Multi-Label Contrastive Learning for Abstract Visual Reasoning | Dec 3, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 0 |
| Learning from Lexical Perturbations for Consistent Visual Question Answering | Nov 26, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Transformation Driven Visual Reasoning | Nov 26, 2020 | AttributeTriplet | CodeCode Available | 1 |
| Interpretable Visual Reasoning via Induced Symbolic Space | Nov 23, 2020 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 0 |
| Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs | Oct 15, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Contextual Modulation for Relation-Level Metaphor Identification | Oct 12, 2020 | RelationVisual Reasoning | CodeCode Available | 0 |
| Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning | Oct 2, 2020 | Novel ConceptsRepresentation Learning | CodeCode Available | 1 |
| CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes | Sep 19, 2020 | Graph Neural NetworkVisual Reasoning | CodeCode Available | 0 |
| A Distance-preserving Matrix Sketch | Sep 8, 2020 | Clusteringfeature selection | CodeCode Available | 0 |
| Video Captioning Using Weak Annotation | Sep 2, 2020 | SentenceVideo Captioning | —Unverified | 0 |
| Learning Long-term Visual Dynamics with Region Proposal Interaction Networks | Aug 5, 2020 | Common Sense ReasoningObject | CodeCode Available | 1 |
| A Closer Look at Generalisation in RAVEN | Aug 1, 2020 | Visual Reasoning | CodeCode Available | 1 |
| TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering | Aug 1, 2020 | ObjectQuestion Answering | —Unverified | 0 |
| Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Aug 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Few-shot Visual Reasoning with Meta-analogical Contrastive Learning | Jul 23, 2020 | Contrastive LearningLogical Reasoning | —Unverified | 0 |
| Learning to Discretely Compose Reasoning Module Networks for Video Captioning | Jul 17, 2020 | DecoderQuestion Answering | CodeCode Available | 1 |
| Multi-Granularity Modularized Network for Abstract Visual Reasoning | Jul 9, 2020 | Visual GroundingVisual Reasoning | —Unverified | 0 |
| Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering | Jun 25, 2020 | DiversityQuestion Answering | —Unverified | 0 |
| Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" | Jun 20, 2020 | Graph GenerationQuestion Answering | —Unverified | 0 |
| Abstract Diagrammatic Reasoning with Multiplex Graph Networks | Jun 19, 2020 | Graph Neural NetworkVisual Reasoning | —Unverified | 0 |
| Forward Prediction for Physical Reasoning | Jun 18, 2020 | PredictionVisual Reasoning | CodeCode Available | 1 |
| Large-Scale Adversarial Training for Vision-and-Language Representation Learning | Jun 11, 2020 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image | Jun 9, 2020 | Motion PlanningTask and Motion Planning | —Unverified | 0 |
| Attention-Based Context Aware Reasoning for Situation Recognition | Jun 1, 2020 | Action RecognitionFine-grained Action Recognition | CodeCode Available | 1 |
| Webly Supervised Knowledge Embedding Model for Visual Reasoning | Jun 1, 2020 | modelRepresentation Learning | —Unverified | 0 |
| Structured Multimodal Attentions for TextVQA | Jun 1, 2020 | Graph AttentionOptical Character Recognition (OCR) | CodeCode Available | 1 |