| Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks | Mar 6, 2023 | Autonomous DrivingMedical Diagnosis | —Unverified | 0 |
| Learning to reason over visual objects | Mar 3, 2023 | Inductive BiasVisual Reasoning | CodeCode Available | 0 |
| Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos | Mar 2, 2023 | Representation LearningSentence | —Unverified | 0 |
| Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection | Feb 13, 2023 | 3D Object DetectionGraph Generation | —Unverified | 0 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 |
| Learning to Agree on Vision Attention for Visual Commonsense Reasoning | Feb 4, 2023 | Visual Commonsense ReasoningVisual Reasoning | —Unverified | 0 |
| Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks | Jan 12, 2023 | Cross-Modal RetrievalOpen-Ended Question Answering | CodeCode Available | 0 |
| A Divide-Align-Conquer Strategy for Program Synthesis | Jan 8, 2023 | ARCInductive logic programming | —Unverified | 0 |
| Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge | Jan 1, 2023 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting | Jan 1, 2023 | Human-Object Interaction DetectionLanguage Modelling | —Unverified | 0 |
| Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge | Jan 1, 2023 | NavigateVisual Reasoning | CodeCode Available | 0 |
| ViLEM: Visual-Language Error Modeling for Image-Text Retrieval | Jan 1, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks | Jan 1, 2023 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 |
| Graph Representation for Order-Aware Visual Transformation | Jan 1, 2023 | Visual Reasoning | —Unverified | 0 |
| EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry | Dec 27, 2022 | Automated Theorem ProvingVisual Reasoning | —Unverified | 0 |
| VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges | Dec 26, 2022 | Representation LearningVisual Question Answering (VQA) | —Unverified | 0 |
| Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason? | Dec 20, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| VASR: Visual Analogies of Situation Recognition | Dec 8, 2022 | Common Sense ReasoningTriplet | CodeCode Available | 0 |
| Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests | Dec 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Does Structural Attention Improve Compositional Representations in Vision-Language Models? | Dec 3, 2022 | Visual Reasoning | —Unverified | 0 |
| Abstract Visual Reasoning with Tangram Shapes | Nov 29, 2022 | Visual Reasoning | —Unverified | 0 |
| Reason from Context with Self-supervised Learning | Nov 23, 2022 | ObjectObject Recognition | —Unverified | 0 |
| Unifying Vision-Language Representation Space with Single-tower Transformer | Nov 21, 2022 | Contrastive LearningObject Localization | —Unverified | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 |
| lilGym: Natural Language Visual Reasoning with Reinforcement Learning | Nov 3, 2022 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning | Oct 9, 2022 | Image-text Retrievalmultimodal interaction | —Unverified | 0 |
| Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning | Oct 4, 2022 | Image CaptioningSentence | CodeCode Available | 0 |
| Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach | Oct 3, 2022 | Referring ExpressionRobot Manipulation | CodeCode Available | 0 |
| A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering | Oct 1, 2022 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Zero-shot visual reasoning through probabilistic analogical mapping | Sep 29, 2022 | Visual Reasoning | —Unverified | 0 |
| Deep Neural Networks for Visual Reasoning | Sep 24, 2022 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| Compositional Law Parsing with Latent Random Functions | Sep 15, 2022 | PositionVisual Reasoning | —Unverified | 0 |
| PaLI: A Jointly-Scaled Multilingual Language-Image Model | Sep 14, 2022 | DecoderFew-Shot Image Classification | —Unverified | 0 |
| Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks | Aug 22, 2022 | AllCross-Modal Retrieval | CodeCode Available | 0 |
| One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning | Jul 31, 2022 | AllReferring Expression | —Unverified | 0 |
| WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models | Jul 25, 2022 | Common Sense ReasoningGeneral Knowledge | CodeCode Available | 0 |
| 3D Concept Grounding on Neural Fields | Jul 13, 2022 | Instance SegmentationQuestion Answering | —Unverified | 0 |
| From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering | Jun 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives | Jun 22, 2022 | Feature ImportanceQuestion Answering | CodeCode Available | 0 |
| SAViR-T: Spatially Attentive Visual Reasoning with Transformers | Jun 18, 2022 | Inductive BiasVisual Reasoning | CodeCode Available | 0 |
| Interactive Visual Reasoning under Uncertainty | Jun 18, 2022 | Visual Reasoning | —Unverified | 0 |
| GAMR: A Guided Attention Model for (visual) Reasoning | Jun 10, 2022 | modelVisual Reasoning | CodeCode Available | 0 |
| VL-BEiT: Generative Vision-Language Pretraining | Jun 2, 2022 | image-classificationImage Classification | —Unverified | 0 |
| Few-shot Subgoal Planning with Language Models | May 28, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Continual learning on 3D point clouds with random compressed rehearsal | May 16, 2022 | Continual LearningVisual Reasoning | —Unverified | 0 |
| Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering | May 9, 2022 | multimodal interactionQuestion Answering | CodeCode Available | 0 |
| Introduction to Soar | May 8, 2022 | ChunkingDecision Making | —Unverified | 0 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering | May 2, 2022 | DecoderImage Captioning | —Unverified | 0 |