Learning Efficient Planning-based Rewards for Imitation Learning
Xingrui Yu, Yueming Lyu, Ivor Tsang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Imitation learning from limited demonstrations is challenging. Most inverse reinforcement learning (IRL) methods are unable to perform as good as the demonstrator, especially in high-dimensional environment, e.g, the Atari domain. To address this challenge, we propose a novel reward learning method, which streamlines a differential planning module with dynamics modeling. Our method learns useful planning computations with a meaningful reward function that focuses on the resulting region of an agent executing an action. Such planning-based reward function leads to policies that generalize well to new tasks. Empirical results with multiple network architectures and reward instances show that our method can outperform state-of-the-art IRL methods on multiple Atari games and continuous control tasks. Our method achieves performance that is averagely 1,139.1% of the demonstration.