Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

2019-09-20NeurIPS 2019Code Available0· sign in to hype

Lantao Yu, Tianhe Yu, Chelsea Finn, Stefano Ermon

Code Available — Be the first to reproduce this paper.

Code

github.com/ermongroup/MetaIRL
OfficialIn papernone★ 0

Abstract

Providing a suitable reward function to reinforcement learning can be difficult in many real world applications. While inverse reinforcement learning (IRL) holds promise for automatically learning reward functions from demonstrations, several major challenges remain. First, existing IRL methods learn reward functions from scratch, requiring large numbers of demonstrations to correctly infer the reward for each task the agent may need to perform. Second, existing methods typically assume homogeneous demonstrations for a single behavior or task, while in practice, it might be easier to collect datasets of heterogeneous but related behaviors. To this end, we propose a deep latent variable model that is capable of learning rewards from demonstrations of distinct but related tasks in an unsupervised way. Critically, our model can infer rewards for new, structurally-similar tasks from a single demonstration. Our experiments on multiple continuous control tasks demonstrate the effectiveness of our approach compared to state-of-the-art imitation and inverse reinforcement learning methods.

Tasks

continuous-control Continuous Control reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Ant	PEMIRL	Average Return	846.18	—	Unverified
Point Maze	PEMIRL	Average Return	-7.37	—	Unverified
Sawyer Pusher	PEMIRL	Average Return	-27.16	—	Unverified
Sweeper	PEMIRL	Average Return	-74.17	—	Unverified

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Code

Abstract

Tasks

Benchmark Results

Reproductions