| Analysis of Visual Reasoning on One-Stage Object Detection | Feb 26, 2022 | Objectobject-detection | —Unverified | 0 |
| Joint Answering and Explanation for Visual Commonsense Reasoning | Feb 25, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |
| Measuring CLEVRness: Blackbox testing of Visual Reasoning Models | Feb 24, 2022 | BenchmarkingDiagnostic | —Unverified | 0 |
| A Review of Emerging Research Directions in Abstract Visual Reasoning | Feb 21, 2022 | Visual Reasoning | —Unverified | 0 |
| Grammar-Based Grounded Lexicon Learning | Feb 17, 2022 | Network EmbeddingSentence | —Unverified | 0 |
| The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning | Feb 10, 2022 | DiagnosticVisual Abductive Reasoning | CodeCode Available | 0 |
| DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models | Feb 8, 2022 | DiagnosticImage Captioning | CodeCode Available | 3 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 |
| Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization | Feb 2, 2022 | Quantizationreinforcement-learning | —Unverified | 0 |
| Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices | Jan 28, 2022 | Visual Reasoning | —Unverified | 0 |
| BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation | Jan 28, 2022 | Image CaptioningImage-text matching | CodeCode Available | 5 |
| Deconfounded Visual Grounding | Dec 31, 2021 | Referring ExpressionVisual Grounding | CodeCode Available | 0 |
| Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation | Dec 22, 2021 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 |
| PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning | Dec 9, 2021 | DiagnosticInstance Segmentation | —Unverified | 0 |
| FLAVA: A Foundational Language And Vision Alignment Model | Dec 8, 2021 | Image RetrievalImage-to-Text Retrieval | CodeCode Available | 1 |
| Robust Visual Reasoning via Language Guided Neural Module Networks | Dec 1, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Recurrent Vision Transformer for Solving Visual Reasoning Problems | Nov 29, 2021 | Object DetectionVisual Reasoning | —Unverified | 0 |
| An in-depth experimental study of sensor usage and visual reasoning of robots navigating in real environments | Nov 29, 2021 | BenchmarkingVisual Navigation | —Unverified | 0 |
| Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction | Nov 24, 2021 | Logical ReasoningVideo Prediction | —Unverified | 0 |
| Grounded Situation Recognition with Transformers | Nov 19, 2021 | DecoderGrounded Situation Recognition | CodeCode Available | 1 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts | Nov 16, 2021 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 1 |
| VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts | Nov 3, 2021 | Image RetrievalImage-text Retrieval | CodeCode Available | 1 |
| An Empirical Study of Training End-to-End Vision-and-Language Transformers | Nov 3, 2021 | Cross-Modal RetrievalDecoder | CodeCode Available | 1 |
| Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language | Oct 28, 2021 | counterfactualVisual Reasoning | —Unverified | 0 |
| Neural-guided, Bidirectional Program Search for Abstraction and Reasoning | Oct 22, 2021 | ARCProgram Synthesis | —Unverified | 0 |
| Neural Structure Mapping For Learning Abstract Visual Analogies | Oct 12, 2021 | Visual AnalogiesVisual Reasoning | —Unverified | 0 |
| ProTo: Program-Guided Transformer for Program-Guided Tasks | Oct 2, 2021 | Decision MakingLearning to Execute | CodeCode Available | 1 |
| Measuring CLEVRness: Black-box Testing of Visual Reasoning Models | Sep 29, 2021 | BenchmarkingDiagnostic | —Unverified | 0 |
| INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision | Sep 29, 2021 | ObjectVideo Object Tracking | —Unverified | 0 |
| Visually Grounded Reasoning across Languages and Cultures | Sep 28, 2021 | Cross-Lingual TransferVisual Reasoning | CodeCode Available | 1 |
| DAReN: A Collaborative Approach Towards Reasoning And Disentangling | Sep 27, 2021 | DisentanglementInductive Bias | —Unverified | 0 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration | Aug 16, 2021 | Visual Reasoning | CodeCode Available | 1 |
| Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models | Aug 9, 2021 | Composed Image Retrieval (CoIR)Image Retrieval | CodeCode Available | 1 |
| Understanding the computational demands underlying visual reasoning | Aug 8, 2021 | Visual Reasoning | —Unverified | 0 |
| Align before Fuse: Vision and Language Representation Learning with Momentum Distillation | Jul 16, 2021 | Cross-Modal RetrievalGrounded language learning | CodeCode Available | 1 |
| Enforcing Consistency in Weakly Supervised Semantic Parsing | Jul 13, 2021 | Semantic ParsingVisual Reasoning | CodeCode Available | 0 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training | Jun 25, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| Bottom-Up Shift and Reasoning for Referring Image Segmentation | Jun 19, 2021 | Image SegmentationSegmentation | CodeCode Available | 0 |
| Explicit Knowledge Incorporation for Visual Reasoning | Jun 19, 2021 | Visual Reasoning | —Unverified | 0 |
| Techniques for Symbol Grounding with SATNet | Jun 16, 2021 | Logical ReasoningVisual Reasoning | CodeCode Available | 0 |
| Understanding and Evaluating Racial Biases in Image Captioning | Jun 16, 2021 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Referring Transformer: A One-step Approach to Multi-task Visual Grounding | Jun 6, 2021 | DecoderReferring Expression | CodeCode Available | 1 |
| Learning Relation Alignment for Calibrated Cross-modal Retrieval | May 28, 2021 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training | May 21, 2021 | Question AnsweringRelation | —Unverified | 0 |
| Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning | May 10, 2021 | Arithmetic ReasoningGeometry Problem Solving | CodeCode Available | 1 |
| Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention | May 5, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |