Maximum a Posteriori Policy Optimisation

2018-06-14ICLR 2018Code Available1· sign in to hype

Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

Code Available — Be the first to reproduce this paper.

Code

github.com/deepmind/rgb_stacking
none★ 129
github.com/MotorCityCobra/C_plusplus_mpo
none★ 0
github.com/acyclics/MPO
pytorch★ 0

Abstract

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular, for continuous control, our method outperforms existing methods with respect to sample efficiency, premature convergence and robustness to hyperparameter settings while achieving similar or better final performance.

Tasks

continuous-control Continuous Control Deep Reinforcement Learning reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Maximum a Posteriori Policy Optimisation

Code

Abstract

Tasks

Reproductions