Learning Task Decomposition with Order-Memory Policy Network

2021-01-01ICLR 2021Unverified0· sign in to hype

Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

Unverified — Be the first to reproduce this paper.

Abstract

Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. To simulate this process, we introduce Ordered Memory Policy Network (OMPN) to discover task decomposition by imitation learning from demonstration. OMPN has an explicit inductive bias to model a hierarchy of sub-tasks. Experiments on Craft world and Dial demonstrate that our model can more accurately recover the task boundaries with behavior cloning under both unsupervised and weakly supervised setting than previous methods. OMPN can also be directly applied to partially observable environments and still achieve high performance. Our visualization further confirms the intuition that OMPN can learn to expand the memory at higher levels when one subtask is close to completion.

Tasks

Imitation Learning Inductive Bias

Learning Task Decomposition with Order-Memory Policy Network

Abstract

Tasks

Reproductions