Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale

2026-03-18Code Available1· sign in to hype

Songze Li, Zun Wang, Gengze Zhou, Jialu Li, Xiangyu Zeng, Ziyang Gong, Limin Wang, Yu Qiao, Qi Wu, Mohit Bansal, Yi Wang

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/opengvlab/sid-vln
Official★ 12

Abstract

Goal-oriented vision-language navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize shortest-path trajectories, lacking effective exploration priors for training navigation agents. To address the above challenges, we present SID, a goal-oriented vision-and-language navigation learning approach with Self-Improving Demonstrations. Specifically, SID learns an initial agent on the shortest-path data sampled from environments and then leverages this agent to generate novel exploration trajectories. The novel rollouts provide demonstrations with stronger exploration strategies to train a better agent, which in turn produces higher-quality agent demonstrations for the next round of training. We show that this iterative self-improving pipeline readily scales to new environments, and the resulting demonstrations are highly transferable, elevating the performance ceiling across a variety of vision-and-language navigation tasks. Extensive experiments demonstrate that SID significantly boosts the exploration capabilities and generalization of navigation agents. The resulting agent achieves new state-of-the-art performance on goal-oriented vision-and-language navigation benchmarks, including REVERIE, SOON as well as strong transferability to object-goal navigation and VLN-CE. It notably achieves a 50.9% success rate on the unseen validation splits of SOON, surpassing prior leading approaches by a margin of 13.9%.

Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale

Code

Abstract

Reproductions