Explicit Object Relation Alignment for Vision and Language Navigation
2021-11-16ACL ARR November 2021Unverified0· sign in to hype
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We propose a neural agent to solve the navigation instruction following problem in a photo-realistic environment. We explicitly align the spatial information in both instruction and the visual environment, including landmarks and spatial relationships between the agent and landmarks. Our method significantly improves the baseline and is competitive with the SOTA in unseen environments. The qualitative analysis shows that explicitly modeled spatial reasoning improves the explainability of the action decisions and the generalizability of the model.