Explicit Object Relation Alignment for Vision and Language Navigation

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

We propose a neural agent to solve the navigation instruction following problem in a photo-realistic environment. We explicitly align the spatial information in both instruction and the visual environment, including landmarks and spatial relationships between the agent and landmarks. Our method significantly improves the baseline and is competitive with the SOTA in unseen environments. The qualitative analysis shows that explicitly modeled spatial reasoning improves the explainability of the action decisions and the generalizability of the model.

Tasks

Instruction Following Relation Spatial Reasoning Vision and Language Navigation

Explicit Object Relation Alignment for Vision and Language Navigation

Abstract

Tasks

Reproductions