Continuous control with deep reinforcement learning
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/DLR-RM/stable-baselines3pytorch★ 12,962
- github.com/facebookresearch/Horizonpytorch★ 3,686
- github.com/toni-sm/skrljax★ 1,014
- github.com/tensorlayer/RLzootf★ 644
- github.com/stevenpjg/ddpg-aigymtf★ 276
- github.com/baturaysaglam/RIS-MISO-Deep-Reinforcement-Learningpytorch★ 218
- github.com/massquantity/DBRLpytorch★ 154
- github.com/Brook1711/RIS_componentstf★ 77
- github.com/shahin-01/vqa-adpytorch★ 19
- github.com/s-sd/task-amenabilitytf★ 18
Abstract
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Ant-v4 | DDPG | Average Return | 1,712.12 | — | Unverified |
| HalfCheetah-v4 | DDPG | Average Return | 14,934.86 | — | Unverified |
| Hopper-v4 | DDPG | Average Return | 1,290.24 | — | Unverified |
| Humanoid-v4 | DDPG | Average Return | 139.14 | — | Unverified |
| Walker2d-v4 | DDPG | Average Return | 2,994.54 | — | Unverified |