| g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks | Nov 26, 2024 | Contrastive LearningQuestion Answering | CodeCode Available | 1 | 5 |
| Learning Vision-and-Language Navigation from YouTube Videos | Jul 22, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 | 5 |
| KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation | Mar 28, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 | 5 |
| Learning Navigational Visual Representations with Semantic Map Supervision | Jul 23, 2023 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 | 5 |
| A Recurrent Vision-and-Language BERT for Navigation | Nov 26, 2020 | Decision MakingDecoder | CodeCode Available | 1 | 5 |
| GridMM: Grid Memory Map for Vision-and-Language Navigation | Jul 24, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 | 5 |
| Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation | Aug 24, 2023 | cross-modal alignmentDescriptive | CodeCode Available | 1 | 5 |
| Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision | Dec 1, 2021 | cross-modal alignmentNavigate | CodeCode Available | 1 | 5 |
| Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View | Jan 10, 2020 | Vision and Language Navigation | CodeCode Available | 1 | 5 |
| WebVLN: Vision-and-Language Navigation on Websites | Dec 25, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 | 5 |