Grounding Complex Navigational Instructions Using Scene Graphs

2021-06-03Unverified0· sign in to hype

Michiel de Jong, Satyapriya Krishna, Anuva Agarwal

Unverified — Be the first to reproduce this paper.

Abstract

Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use of this data set, we map the scenes to the VizDoom environment and use the architecture in gatedattention to train an agent to carry out these more complex language instructions.

Tasks

Question Answering reinforcement-learning Reinforcement Learning (RL)Visual Question Answering Visual Question Answering (VQA)

Grounding Complex Navigational Instructions Using Scene Graphs

Abstract

Tasks

Reproductions