Smooth Exploration for Robotic Reinforcement Learning
Antonin Raffin, Jens Kober, Freek Stulp
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/DLR-RM/stable-baselines3OfficialIn paperpytorch★ 12,962
- github.com/facebookresearch/rljax★ 3,347
- github.com/araffin/sbxjax★ 572
- github.com/markub3327/rl-toolkittf★ 21
Abstract
Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| PyBullet Ant | A2C | Return | 1,967 | — | Unverified |
| PyBullet Ant | SAC gSDE | Return | 3,459 | — | Unverified |
| PyBullet Ant | TD3 gSDE | Return | 3,267 | — | Unverified |
| PyBullet Ant | TD3 | Return | 2,865 | — | Unverified |
| PyBullet Ant | SAC | Return | 2,859 | — | Unverified |
| PyBullet Ant | PPO gSDE | Return | 2,587 | — | Unverified |
| PyBullet Ant | A2C gSDE | Return | 2,560 | — | Unverified |
| PyBullet Ant | PPO | Return | 2,160 | — | Unverified |
| PyBullet HalfCheetah | PPO | Return | 2,254 | — | Unverified |
| PyBullet HalfCheetah | SAC | Return | 2,883 | — | Unverified |
| PyBullet HalfCheetah | SAC gSDE | Return | 2,850 | — | Unverified |
| PyBullet HalfCheetah | PPO + gSDE | Return | 2,760 | — | Unverified |
| PyBullet HalfCheetah | TD3 | Return | 2,687 | — | Unverified |
| PyBullet HalfCheetah | TD3 gSDE | Return | 2,578 | — | Unverified |
| PyBullet HalfCheetah | A2C + gSDE | Return | 2,028 | — | Unverified |
| PyBullet HalfCheetah | A2C | Return | 1,652 | — | Unverified |
| PyBullet Hopper | A2C gSDE | Return | 1,448 | — | Unverified |
| PyBullet Hopper | SAC gSDE | Return | 2,646 | — | Unverified |
| PyBullet Hopper | PPO gSDE | Return | 2,508 | — | Unverified |
| PyBullet Hopper | SAC | Return | 2,477 | — | Unverified |
| PyBullet Hopper | TD3 | Return | 2,470 | — | Unverified |
| PyBullet Hopper | TD3 gSDE | Return | 2,353 | — | Unverified |
| PyBullet Hopper | PPO | Return | 1,622 | — | Unverified |
| PyBullet Hopper | A2C | Return | 1,559 | — | Unverified |
| PyBullet Walker2D | SAC gSDE | Return | 2,341 | — | Unverified |
| PyBullet Walker2D | SAC | Return | 2,215 | — | Unverified |
| PyBullet Walker2D | TD3 | Return | 2,106 | — | Unverified |
| PyBullet Walker2D | TD3 gSDE | Return | 1,989 | — | Unverified |
| PyBullet Walker2D | PPO gSDE | Return | 1,776 | — | Unverified |
| PyBullet Walker2D | PPO | Return | 1,238 | — | Unverified |
| PyBullet Walker2D | A2C gSDE | Return | 694 | — | Unverified |
| PyBullet Walker2D | A2C | Return | 443 | — | Unverified |