| NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation | Feb 24, 2024 | Decision MakingInstruction Following | —Unverified | 0 |
| WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | Feb 8, 2024 | Conversational Web NavigationText Generation | CodeCode Available | 5 |
| VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation | Feb 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| NavHint: Vision and Language Navigation Agent with a Hint Generator | Feb 4, 2024 | Vision and Language Navigation | CodeCode Available | 0 |
| MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation | Jan 14, 2024 | Decision MakingVision and Language Navigation | —Unverified | 0 |
| WebVLN: Vision-and-Language Navigation on Websites | Dec 25, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| Which way is `right'?: Uncovering limitations of Vision-and-Language Navigation model | Nov 30, 2023 | Vision and Language Navigation | —Unverified | 0 |
| DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation | Nov 29, 2023 | cross-modal alignmentNavigate | —Unverified | 0 |
| Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions? | Nov 28, 2023 | Data AugmentationTranslation | —Unverified | 0 |
| Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation | Nov 22, 2023 | NavigateTest-time Adaptation | CodeCode Available | 1 |
| Vision and Language Navigation in the Real World via Online Visual Language Mapping | Oct 16, 2023 | Vision and Language Navigation | —Unverified | 0 |
| LangNav: Language as a Perceptual Representation for Navigation | Oct 11, 2023 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Evaluating Explanation Methods for Vision-and-Language Navigation | Oct 10, 2023 | Decision MakingNavigate | —Unverified | 0 |
| Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation | Sep 7, 2023 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation | Aug 24, 2023 | cross-modal alignmentDescriptive | CodeCode Available | 1 |
| VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation | Aug 20, 2023 | Transfer LearningVision and Language Navigation | CodeCode Available | 0 |
| March in Chat: Interactive Prompting for Remote Embodied Referring Expression | Aug 20, 2023 | Referring ExpressionVision and Language Navigation | CodeCode Available | 1 |
| A^2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models | Aug 15, 2023 | NavigateRobot Navigation | —Unverified | 0 |
| AerialVLN: Vision-and-Language Navigation for UAVs | Aug 13, 2023 | cross-modal alignmentNavigate | CodeCode Available | 2 |
| Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes | Aug 7, 2023 | NavigateVision and Language Navigation | —Unverified | 0 |
| Scaling Data Generation in Vision-and-Language Navigation | Jul 28, 2023 | Imitation LearningVision and Language Navigation | CodeCode Available | 2 |
| Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation | Jul 25, 2023 | Vision and Language Navigation | CodeCode Available | 0 |
| GridMM: Grid Memory Map for Vision-and-Language Navigation | Jul 24, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| Learning Navigational Visual Representations with Semantic Map Supervision | Jul 23, 2023 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| Learning Vision-and-Language Navigation from YouTube Videos | Jul 22, 2023 | NavigateVision and Language Navigation | CodeCode Available | 1 |