| LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation | Sep 26, 2022 | Spatial ReasoningVision and Language Navigation | CodeCode Available | 0 |
| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 |
| CASPER: Cognitive Architecture for Social Perception and Engagement in Robots | Sep 1, 2022 | Action RecognitionNavigate | —Unverified | 0 |
| Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning | Jul 6, 2022 | DiagnosticMulti-Task Learning | CodeCode Available | 0 |
| Translating Place-Related Questions to GeoSPARQL Queries | May 6, 2022 | Geographic Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Explicit Object Relation Alignment for Vision and Language Navigation | May 1, 2022 | ObjectRelation | CodeCode Available | 0 |
| DeepSSN: a deep convolutional neural network to assess spatial scene similarity | Feb 7, 2022 | Data AugmentationInformation Retrieval | CodeCode Available | 0 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Nov 16, 2021 | image-classificationImage Classification | —Unverified | 0 |
| Explicit Object Relation Alignment for Vision and Language Navigation | Nov 16, 2021 | Instruction FollowingRelation | —Unverified | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |