Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement Learning

2019-09-25Unverified0· sign in to hype

Giseung Park, Whiyoung Jung, Sungho Choi, Youngchul Sung

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, a new intrinsic reward generation method for sparse-reward reinforcement learning is proposed based on an ensemble of dynamics models. In the proposed method, the mixture of multiple dynamics models is used to approximate the true unknown transition probability, and the intrinsic reward is designed as the minimum of the surprise seen from each dynamics model to the mixture of the dynamics models. In order to show the effectiveness of the proposed intrinsic reward generation method, a working algorithm is constructed by combining the proposed intrinsic reward generation method with the proximal policy optimization (PPO) algorithm. Numerical results show that for representative locomotion tasks, the proposed model-ensemble-based intrinsic reward generation method outperforms the previous methods based on a single dynamics model.

Tasks

reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement Learning

Abstract

Tasks

Reproductions