IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/deepmind/scalable_agentOfficialIn papertf★ 0
- github.com/jerrodparker20/adaptive-transformers-in-rlpytorch★ 136
- github.com/michaelnny/deep_rl_zoopytorch★ 122
- github.com/google-research/valantf★ 85
- github.com/seolhokim/SimpleDistributedRLpytorch★ 7
- github.com/windstrip/DeepMind-StreetLearntf★ 1
- github.com/facebookresearch/torchbeastpytorch★ 0
- github.com/deepmind/streetlearntf★ 0
- github.com/crazydonkey200/neural-symbolic-machinestf★ 0
- github.com/villinvic/Georgesnone★ 0
Abstract
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Atari 2600 Alien | IMPALA (deep) | Score | 15,962.1 | — | Unverified |
| Atari 2600 Amidar | IMPALA (deep) | Score | 1,554.79 | — | Unverified |
| Atari 2600 Assault | IMPALA (deep) | Score | 19,148.47 | — | Unverified |
| Atari 2600 Asterix | IMPALA (deep) | Score | 300,732 | — | Unverified |
| Atari 2600 Asteroids | IMPALA (deep) | Score | 108,590.05 | — | Unverified |
| Atari 2600 Atlantis | IMPALA (deep) | Score | 849,967.5 | — | Unverified |
| Atari 2600 Bank Heist | IMPALA (deep) | Score | 1,223.15 | — | Unverified |
| Atari 2600 Battle Zone | IMPALA (deep) | Score | 20,885 | — | Unverified |
| Atari 2600 Beam Rider | IMPALA (deep) | Score | 32,463.47 | — | Unverified |
| Atari 2600 Berzerk | IMPALA (deep) | Score | 1,852.7 | — | Unverified |
| Atari 2600 Bowling | IMPALA (deep) | Score | 59.92 | — | Unverified |
| Atari 2600 Boxing | IMPALA (deep) | Score | 99.96 | — | Unverified |
| Atari 2600 Breakout | IMPALA (deep) | Score | 787.34 | — | Unverified |
| Atari 2600 Centipede | IMPALA (deep) | Score | 11,049.75 | — | Unverified |
| Atari 2600 Chopper Command | IMPALA (deep) | Score | 28,255 | — | Unverified |
| Atari 2600 Crazy Climber | IMPALA (deep) | Score | 136,950 | — | Unverified |
| Atari 2600 Defender | IMPALA (deep) | Score | 185,203 | — | Unverified |
| Atari 2600 Demon Attack | IMPALA (deep) | Score | 132,826.98 | — | Unverified |
| Atari 2600 Double Dunk | IMPALA (deep) | Score | -0.33 | — | Unverified |
| Atari 2600 Enduro | IMPALA (deep) | Score | 0 | — | Unverified |
| Atari 2600 Fishing Derby | IMPALA (deep) | Score | 44.85 | — | Unverified |
| Atari 2600 Freeway | IMPALA (deep) | Score | 0 | — | Unverified |
| Atari 2600 Frostbite | IMPALA (deep) | Score | 317.75 | — | Unverified |
| Atari 2600 Gopher | IMPALA (deep) | Score | 66,782.3 | — | Unverified |
| Atari 2600 Gravitar | IMPALA (deep) | Score | 359.5 | — | Unverified |
| Atari 2600 HERO | IMPALA (deep) | Score | 33,730.55 | — | Unverified |
| Atari 2600 Ice Hockey | IMPALA (deep) | Score | 3.48 | — | Unverified |
| Atari 2600 James Bond | IMPALA (deep) | Score | 601.5 | — | Unverified |
| Atari 2600 Kangaroo | IMPALA (deep) | Score | 1,632 | — | Unverified |
| Atari 2600 Krull | IMPALA (deep) | Score | 8,147.4 | — | Unverified |
| Atari 2600 Kung-Fu Master | IMPALA (deep) | Score | 43,375.5 | — | Unverified |
| Atari 2600 Montezuma's Revenge | IMPALA (deep) | Score | 0 | — | Unverified |
| Atari 2600 Ms. Pacman | IMPALA (deep) | Score | 7,342.32 | — | Unverified |
| Atari 2600 Name This Game | IMPALA (deep) | Score | 21,537.2 | — | Unverified |
| Atari 2600 Phoenix | IMPALA (deep) | Score | 210,996.45 | — | Unverified |
| Atari 2600 Pitfall! | IMPALA (deep) | Score | -1.66 | — | Unverified |
| Atari 2600 Pong | IMPALA (deep) | Score | 20.98 | — | Unverified |
| Atari 2600 Private Eye | IMPALA (deep) | Score | 98.5 | — | Unverified |
| Atari 2600 Q*Bert | IMPALA (deep) | Score | 351,200.12 | — | Unverified |
| Atari 2600 River Raid | IMPALA (deep) | Score | 29,608.05 | — | Unverified |
| Atari 2600 Road Runner | IMPALA (deep) | Score | 57,121 | — | Unverified |
| Atari 2600 Robotank | IMPALA (deep) | Score | 12.96 | — | Unverified |
| Atari 2600 Seaquest | IMPALA (deep) | Score | 1,753.2 | — | Unverified |
| Atari 2600 Skiing | IMPALA (deep) | Score | -10,180.38 | — | Unverified |
| Atari 2600 Solaris | IMPALA (deep) | Score | 2,365 | — | Unverified |
| Atari 2600 Space Invaders | IMPALA (deep) | Score | 43,595.78 | — | Unverified |
| Atari 2600 Star Gunner | IMPALA (deep) | Score | 200,625 | — | Unverified |
| Atari 2600 Surround | IMPALA (deep) | Score | 7.56 | — | Unverified |
| Atari 2600 Tennis | IMPALA (deep) | Score | 0.55 | — | Unverified |
| Atari 2600 Time Pilot | IMPALA (deep) | Score | 48,481.5 | — | Unverified |