| Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision | Dec 1, 2021 | cross-modal alignmentNavigate | CodeCode Available | 1 |
| History Aware Multimodal Transformer for Vision-and-Language Navigation | Oct 25, 2021 | Decision MakingNavigate | CodeCode Available | 1 |
| KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation | Mar 28, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation | Nov 10, 2021 | DecoderNavigate | CodeCode Available | 1 |
| Language and Visual Entity Relationship Graph for Agent Navigation | Oct 19, 2020 | Dynamic Time WarpingNavigate | CodeCode Available | 1 |
| GridMM: Grid Memory Map for Vision-and-Language Navigation | Jul 24, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| March in Chat: Interactive Prompting for Remote Embodied Referring Expression | Aug 20, 2023 | Referring ExpressionVision and Language Navigation | CodeCode Available | 1 |
| Pathdreamer: A World Model for Indoor Navigation | May 18, 2021 | modelSemantic Segmentation | CodeCode Available | 1 |
| Simple and Effective Synthesis of Indoor 3D Scenes | Apr 6, 2022 | Data AugmentationVision and Language Navigation | CodeCode Available | 1 |
| WebVLN: Vision-and-Language Navigation on Websites | Dec 25, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 |