A novel DDPG method with prioritized experience replay

2017-10-01IEEE International Conference on Systems, Man and Cybernetics (SMC) 2017Code Available0· sign in to hype

Yuenan Hou, Lifeng Liu, Qing Wei, Xudong Xu, Chunlin Chen

Code Available — Be the first to reproduce this paper.

Code

github.com/cardwing/Codes-for-RL-PER
tf★ 51

Abstract

Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has achieved good performance in many continuous control tasks in the MuJoCo simulator. To further improve the efficiency of the experience replay mechanism in DDPG and thus speeding up the training process, in this paper, a prioritized experience replay method is proposed for the DDPG algorithm, where prioritized sampling is adopted instead of uniform sampling. The proposed DDPG with prioritized experience replay is tested with an inverted pendulum task via OpenAI Gym. The experimental results show that DDPG with prioritized experience replay can reduce the training time and improve the stability of the training process, and is less sensitive to the changes of some hyperparameters such as the size of replay buffer, minibatch and the updating rate of the target network.

Tasks

continuous-control Continuous Control MuJoCo OpenAI Gym

A novel DDPG method with prioritized experience replay

Code

Abstract

Tasks

Reproductions