Continuous control with deep reinforcement learning

2015-09-09Code Available1· sign in to hype

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/stevenpjg/ddpg-aigym
tf★ 276
github.com/baturaysaglam/RIS-MISO-Deep-Reinforcement-Learning
pytorch★ 218
github.com/nav74neet/ddpg_biped
tf★ 104
github.com/Brook1711/RIS_components
tf★ 77
github.com/AgrawalAmey/safe-explorer
pytorch★ 75
github.com/dchetelat/acer
pytorch★ 25
github.com/shahin-01/vqa-ad
pytorch★ 19
github.com/YangRui2015/Modular_HER
tf★ 17
github.com/HJDQN/HJQ
pytorch★ 16
github.com/montaserFath/Reinforcement-Learning-for-Prosthetics
none★ 9

Abstract

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Tasks

Action Detection continuous-control Continuous Control Deep Reinforcement Learning OpenAI Gym Q-Learning Reinforcement Learning Reinforcement Learning (RL)Speech Emotion Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Ant-v4	DDPG	Average Return	1,712.12	—	Unverified
HalfCheetah-v4	DDPG	Average Return	14,934.86	—	Unverified
Hopper-v4	DDPG	Average Return	1,290.24	—	Unverified
Humanoid-v4	DDPG	Average Return	139.14	—	Unverified
Walker2d-v4	DDPG	Average Return	2,994.54	—	Unverified

Continuous control with deep reinforcement learning

Code

Abstract

Tasks

Benchmark Results

Reproductions