Prioritized Experience Replay
Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/labmlai/annotated_deep_learning_paper_implementationspytorch★ 66,103
- github.com/tensorlayer/RLzootf★ 644
- github.com/instadeepai/flashbaxjax★ 274
- github.com/atavakol/action-hypergraph-networkstf★ 23
- github.com/xusophia/DataSciFinalProjpytorch★ 4
- github.com/VictorZuanazzi/Project_RLpytorch★ 0
- github.com/snhwang/p1_navigation_SNHpytorch★ 0
- github.com/CSCI4850/S20-team3-projectnone★ 0
- github.com/VasaKiDD/TD3-deep-rl-researchpytorch★ 0
- github.com/Howuhh/prioritized_experience_replaypytorch★ 0
Abstract
Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Atari 2600 Alien | Prior hs | Score | 1,334.7 | — | Unverified |
| Atari 2600 Alien | Prior noop | Score | 4,203.8 | — | Unverified |
| Atari 2600 Amidar | Prior noop | Score | 1,838.9 | — | Unverified |
| Atari 2600 Amidar | Prior hs | Score | 129.1 | — | Unverified |
| Atari 2600 Assault | Prior hs | Score | 6,548.9 | — | Unverified |
| Atari 2600 Assault | Prior noop | Score | 7,672.1 | — | Unverified |
| Atari 2600 Asterix | Prior hs | Score | 22,484.5 | — | Unverified |
| Atari 2600 Asterix | Prior noop | Score | 31,527 | — | Unverified |
| Atari 2600 Asteroids | Prior hs | Score | 1,745.1 | — | Unverified |
| Atari 2600 Asteroids | Prior noop | Score | 2,654.3 | — | Unverified |
| Atari 2600 Atlantis | Prior noop | Score | 357,324 | — | Unverified |
| Atari 2600 Atlantis | Prior hs | Score | 330,647 | — | Unverified |
| Atari 2600 Bank Heist | Prior hs | Score | 876.6 | — | Unverified |
| Atari 2600 Bank Heist | Prior noop | Score | 1,054.6 | — | Unverified |
| Atari 2600 Battle Zone | Prior hs | Score | 25,520 | — | Unverified |
| Atari 2600 Battle Zone | Prior noop | Score | 31,530 | — | Unverified |
| Atari 2600 Beam Rider | Prior noop | Score | 23,384.2 | — | Unverified |
| Atari 2600 Beam Rider | Prior hs | Score | 31,181.3 | — | Unverified |
| Atari 2600 Berzerk | Prior hs | Score | 865.9 | — | Unverified |
| Atari 2600 Berzerk | Prior noop | Score | 1,305.6 | — | Unverified |
| Atari 2600 Bowling | Prior noop | Score | 47.9 | — | Unverified |
| Atari 2600 Bowling | Prior hs | Score | 52 | — | Unverified |
| Atari 2600 Boxing | Prior hs | Score | 72.3 | — | Unverified |
| Atari 2600 Boxing | Prior noop | Score | 95.6 | — | Unverified |
| Atari 2600 Breakout | Prior hs | Score | 343 | — | Unverified |
| Atari 2600 Breakout | Prior noop | Score | 373.9 | — | Unverified |
| Atari 2600 Centipede | Prior noop | Score | 4,463.2 | — | Unverified |
| Atari 2600 Centipede | Prior hs | Score | 3,489.1 | — | Unverified |
| Atari 2600 Chopper Command | Prior hs | Score | 4,635 | — | Unverified |
| Atari 2600 Chopper Command | Prior noop | Score | 8,600 | — | Unverified |
| Atari 2600 Crazy Climber | Prior hs | Score | 127,512 | — | Unverified |
| Atari 2600 Crazy Climber | Prior noop | Score | 141,161 | — | Unverified |
| Atari 2600 Demon Attack | Prior noop | Score | 71,846.4 | — | Unverified |
| Atari 2600 Demon Attack | Prior hs | Score | 61,277.5 | — | Unverified |
| Atari 2600 Double Dunk | Prior hs | Score | 16 | — | Unverified |
| Atari 2600 Double Dunk | Prior noop | Score | 18.5 | — | Unverified |
| Atari 2600 Enduro | Prior noop | Score | 2,093 | — | Unverified |
| Atari 2600 Enduro | Prior hs | Score | 1,831 | — | Unverified |
| Atari 2600 Fishing Derby | Prior noop | Score | 39.5 | — | Unverified |
| Atari 2600 Fishing Derby | Prior hs | Score | 9.8 | — | Unverified |
| Atari 2600 Freeway | Prior noop | Score | 33.7 | — | Unverified |
| Atari 2600 Freeway | Prior hs | Score | 28.9 | — | Unverified |
| Atari 2600 Frostbite | Prior noop | Score | 4,380.1 | — | Unverified |
| Atari 2600 Frostbite | Prior hs | Score | 3,510 | — | Unverified |
| Atari 2600 Gopher | Prior hs | Score | 34,858.8 | — | Unverified |
| Atari 2600 Gopher | Prior noop | Score | 32,487.2 | — | Unverified |
| Atari 2600 Gravitar | Prior hs | Score | 269.5 | — | Unverified |
| Atari 2600 Gravitar | Prior noop | Score | 548.5 | — | Unverified |
| Atari 2600 HERO | Prior noop | Score | 23,037.7 | — | Unverified |
| Atari 2600 HERO | Prior hs | Score | 20,889.9 | — | Unverified |