Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
2018-01-04ICML 2018Code Available1· sign in to hype
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/haarnoja/sacOfficialIn papertf★ 0
- github.com/DLR-RM/stable-baselines3pytorch★ 12,962
- github.com/facebookresearch/ReAgentpytorch★ 3,690
- github.com/toni-sm/skrljax★ 1,014
- github.com/Rafael1s/Deep-Reinforcement-Learning-Udacitypytorch★ 992
- github.com/trackmania-rl/tmrlpytorch★ 687
- github.com/tensorlayer/RLzootf★ 644
- github.com/araffin/sbxjax★ 572
- github.com/Kaixhin/imitation-learningpytorch★ 563
- github.com/andrejorsula/drl_graspingpytorch★ 506
Abstract
A platform for Applied Reinforcement Learning (Applied RL)
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Ant-v4 | SAC | Average Return | 5,208.09 | — | Unverified |
| HalfCheetah-v4 | SAC | Average Return | 15,836.04 | — | Unverified |
| Hopper-v4 | SAC | Average Return | 2,882.56 | — | Unverified |
| Humanoid-v4 | SAC | Average Return | 6,211.5 | — | Unverified |
| Walker2d-v4 | SAC | Average Return | 5,745.27 | — | Unverified |