SOTAVerified

Learning Efficient Planning-based Rewards for Imitation Learning

2021-01-01Unverified0· sign in to hype

Xingrui Yu, Yueming Lyu, Ivor Tsang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Imitation learning from limited demonstrations is challenging. Most inverse reinforcement learning (IRL) methods are unable to perform as good as the demonstrator, especially in high-dimensional environment, e.g, the Atari domain. To address this challenge, we propose a novel reward learning method, which streamlines a differential planning module with dynamics modeling. Our method learns useful planning computations with a meaningful reward function that focuses on the resulting region of an agent executing an action. Such planning-based reward function leads to policies that generalize well to new tasks. Empirical results with multiple network architectures and reward instances show that our method can outperform state-of-the-art IRL methods on multiple Atari games and continuous control tasks. Our method achieves performance that is averagely 1,139.1% of the demonstration.

Tasks

Reproductions