| HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation | Mar 22, 2022 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Cross-modal Map Learning for Vision and Language Navigation | Mar 10, 2022 | Vision and Language Navigation | CodeCode Available | 1 |
| Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation | Mar 5, 2022 | Imitation LearningVision and Language Navigation | CodeCode Available | 1 |
| One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones | Feb 14, 2022 | Vision and Language Navigation | CodeCode Available | 1 |
| Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision | Dec 1, 2021 | cross-modal alignmentNavigate | CodeCode Available | 1 |
| Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation | Nov 10, 2021 | DecoderNavigate | CodeCode Available | 1 |
| History Aware Multimodal Transformer for Vision-and-Language Navigation | Oct 25, 2021 | Decision MakingNavigate | CodeCode Available | 1 |
| SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments | Aug 26, 2021 | Vision and Language Navigation | CodeCode Available | 1 |
| Airbert: In-domain Pretraining for Vision-and-Language Navigation | Aug 20, 2021 | NavigateReferring Expression | CodeCode Available | 1 |
| Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation | Jul 23, 2021 | Vision and Language NavigationVision-Language Navigation | CodeCode Available | 1 |
| Neighbor-view Enhanced Model for Vision and Language Navigation | Jul 15, 2021 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| How Much Can CLIP Benefit Vision-and-Language Tasks? | Jul 13, 2021 | Question AnsweringVision and Language Navigation | CodeCode Available | 1 |
| Pathdreamer: A World Model for Indoor Navigation | May 18, 2021 | modelSemantic Segmentation | CodeCode Available | 1 |
| Episodic Transformer for Vision-and-Language Navigation | May 13, 2021 | Vision and Language Navigation | CodeCode Available | 1 |
| The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation | Apr 9, 2021 | Vision and Language NavigationVision-Language Navigation | CodeCode Available | 1 |
| A Recurrent Vision-and-Language BERT for Navigation | Nov 26, 2020 | Decision MakingDecoder | CodeCode Available | 1 |
| Sim-to-Real Transfer for Vision-and-Language Navigation | Nov 7, 2020 | Vision and Language Navigation | CodeCode Available | 1 |
| Language and Visual Entity Relationship Graph for Agent Navigation | Oct 19, 2020 | Dynamic Time WarpingNavigate | CodeCode Available | 1 |
| Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding | Oct 15, 2020 | Vision and Language Navigation | CodeCode Available | 1 |
| Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation | Jul 1, 2020 | Style TransferText Style Transfer | CodeCode Available | 1 |
| BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps | May 10, 2020 | Imitation LearningNavigate | CodeCode Available | 1 |
| Diagnosing the Environment Bias in Vision-and-Language Navigation | May 6, 2020 | Vision and Language Navigation | CodeCode Available | 1 |
| Improving Vision-and-Language Navigation with Image-Text Pairs from the Web | Apr 30, 2020 | Vision and Language Navigation | CodeCode Available | 1 |
| Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments | Apr 6, 2020 | Vision and Language Navigation | CodeCode Available | 1 |
| Sub-Instruction Aware Vision-and-Language Navigation | Apr 6, 2020 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training | Feb 25, 2020 | NavigateSelf-Supervised Learning | CodeCode Available | 1 |
| Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View | Jan 10, 2020 | Vision and Language Navigation | CodeCode Available | 1 |
| VALAN: Vision and Language Agent Navigation | Dec 6, 2019 | Deep Reinforcement Learningreinforcement-learning | CodeCode Available | 1 |
| Self-Monitoring Navigation Agent via Auxiliary Progress Estimation | Jan 10, 2019 | Natural Language Visual GroundingVision and Language Navigation | CodeCode Available | 1 |
| Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments | Nov 29, 2018 | PositionSpatial Reasoning | CodeCode Available | 1 |
| Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments | Nov 20, 2017 | Reinforcement LearningTranslation | CodeCode Available | 1 |
| Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities | Jul 17, 2025 | Large Language ModelVision and Language Navigation | —Unverified | 0 |
| Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding | Jun 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Navigation Framework Utilizing Vision-Language Models | Jun 11, 2025 | NavigatePrompt Engineering | CodeCode Available | 0 |
| Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion | May 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans | May 5, 2025 | Vision and Language Navigation | —Unverified | 0 |
| DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation | Apr 30, 2025 | NavigateObject | —Unverified | 0 |
| ST-Booster: An Iterative SpatioTemporal Perception Booster for Vision-and-Language Navigation in Continuous Environments | Apr 14, 2025 | NavigateVision and Language Navigation | —Unverified | 0 |
| Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation | Apr 9, 2025 | HallucinationSpatial Reasoning | —Unverified | 0 |
| COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation | Mar 31, 2025 | MemorizationVision and Language Navigation | —Unverified | 0 |
| Do Visual Imaginations Improve Vision-and-Language Navigation Agents? | Mar 20, 2025 | Vision and Language Navigation | —Unverified | 0 |
| HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard | Mar 18, 2025 | BenchmarkingHuman Dynamics | —Unverified | 0 |
| FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks | Mar 18, 2025 | Vision and Language Navigation | —Unverified | 0 |
| Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation | Mar 14, 2025 | cross-modal alignmentNavigate | —Unverified | 0 |
| Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction | Mar 14, 2025 | NavigateVision and Language Navigation | —Unverified | 0 |
| PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation | Mar 13, 2025 | Image InpaintingImage Outpainting | —Unverified | 0 |
| SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation | Mar 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments | Feb 26, 2025 | Instruction FollowingVision and Language Navigation | —Unverified | 0 |
| TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation | Feb 11, 2025 | RetrievalVision and Language Navigation | —Unverified | 0 |
| Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models | Jan 7, 2025 | Instruction FollowingVision and Language Navigation | —Unverified | 0 |