Deep Reinforcement Learning with Double Q-learning
Hado van Hasselt, Arthur Guez, David Silver
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/labmlai/annotated_deep_learning_paper_implementationspytorch★ 66,103
- github.com/facebookresearch/ReAgentpytorch★ 3,690
- github.com/opendilab/DI-enginepytorch★ 3,606
- github.com/toni-sm/skrljax★ 1,014
- github.com/tensorlayer/RLzootf★ 644
- github.com/microsoft/med-deadendpytorch★ 50
- github.com/OscarHuangWind/Preference-Guided-DQN-Ataripytorch★ 12
- github.com/wtingda/DeepRLBreakouttf★ 6
- github.com/wmol4/Pytorch_DDQN_Unity_Navigationpytorch★ 1
- github.com/jadag/DDQN_mariotf★ 0
Abstract
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Atari 2600 Alien | DQN hs | Score | 634 | — | Unverified |
| Atari 2600 Alien | DQN noop | Score | 1,620 | — | Unverified |
| Atari 2600 Alien | DDQN (tuned) hs | Score | 1,033.4 | — | Unverified |
| Atari 2600 Amidar | DQN hs | Score | 178.4 | — | Unverified |
| Atari 2600 Amidar | DDQN (tuned) hs | Score | 169.1 | — | Unverified |
| Atari 2600 Amidar | DQN noop | Score | 978 | — | Unverified |
| Atari 2600 Assault | DQN noop | Score | 4,280.4 | — | Unverified |
| Atari 2600 Assault | DDQN (tuned) hs | Score | 6,060.8 | — | Unverified |
| Atari 2600 Assault | DQN hs | Score | 3,489.3 | — | Unverified |
| Atari 2600 Asterix | DQN hs | Score | 3,170.5 | — | Unverified |
| Atari 2600 Asterix | DDQN (tuned) hs | Score | 16,837 | — | Unverified |
| Atari 2600 Asterix | DQN noop | Score | 4,359 | — | Unverified |
| Atari 2600 Asteroids | DDQN (tuned) hs | Score | 1,193.2 | — | Unverified |
| Atari 2600 Asteroids | DQN noop | Score | 1,364.5 | — | Unverified |
| Atari 2600 Asteroids | Prior+Duel hs | Score | 1,021.9 | — | Unverified |
| Atari 2600 Asteroids | DQN hs | Score | 1,458.7 | — | Unverified |
| Atari 2600 Atlantis | DQN noop | Score | 279,987 | — | Unverified |
| Atari 2600 Atlantis | DDQN (tuned) hs | Score | 319,688 | — | Unverified |
| Atari 2600 Atlantis | DQN hs | Score | 292,491 | — | Unverified |
| Atari 2600 Atlantis | Prior+Duel hs | Score | 423,252 | — | Unverified |
| Atari 2600 Bank Heist | Prior+Duel hs | Score | 1,004.6 | — | Unverified |
| Atari 2600 Bank Heist | DQN hs | Score | 312.7 | — | Unverified |
| Atari 2600 Bank Heist | DDQN (tuned) hs | Score | 886 | — | Unverified |
| Atari 2600 Bank Heist | DQN noop | Score | 455 | — | Unverified |
| Atari 2600 Battle Zone | DDQN (tuned) hs | Score | 24,740 | — | Unverified |
| Atari 2600 Battle Zone | DQN hs | Score | 23,750 | — | Unverified |
| Atari 2600 Battle Zone | Prior+Duel hs | Score | 30,650 | — | Unverified |
| Atari 2600 Battle Zone | DQN noop | Score | 29,900 | — | Unverified |
| Atari 2600 Beam Rider | Prior+Duel hs | Score | 37,412.2 | — | Unverified |
| Atari 2600 Beam Rider | DDQN (tuned) hs | Score | 17,417.2 | — | Unverified |
| Atari 2600 Beam Rider | DQN hs | Score | 9,743.2 | — | Unverified |
| Atari 2600 Beam Rider | DQN noop | Score | 8,627.5 | — | Unverified |
| Atari 2600 Berzerk | Prior+Duel hs | Score | 2,178.6 | — | Unverified |
| Atari 2600 Berzerk | DQN hs | Score | 493.4 | — | Unverified |
| Atari 2600 Berzerk | DDQN (tuned) hs | Score | 1,011.1 | — | Unverified |
| Atari 2600 Berzerk | DQN noop | Score | 585.6 | — | Unverified |
| Atari 2600 Bowling | DQN noop | Score | 50.4 | — | Unverified |
| Atari 2600 Bowling | DQN hs | Score | 56.5 | — | Unverified |
| Atari 2600 Bowling | Prior+Duel hs | Score | 50.4 | — | Unverified |
| Atari 2600 Bowling | DDQN (tuned) hs | Score | 69.6 | — | Unverified |
| Atari 2600 Boxing | Prior+Duel hs | Score | 79.2 | — | Unverified |
| Atari 2600 Boxing | DQN hs | Score | 70.3 | — | Unverified |
| Atari 2600 Boxing | DQN noop | Score | 88 | — | Unverified |
| Atari 2600 Boxing | DDQN (tuned) hs | Score | 73.5 | — | Unverified |
| Atari 2600 Breakout | DQN hs | Score | 354.5 | — | Unverified |
| Atari 2600 Breakout | Prior+Duel hs | Score | 354.6 | — | Unverified |
| Atari 2600 Breakout | DDQN (tuned) hs | Score | 368.9 | — | Unverified |
| Atari 2600 Breakout | DQN noop | Score | 385.5 | — | Unverified |
| Atari 2600 Centipede | DQN noop | Score | 4,657.7 | — | Unverified |
| Atari 2600 Centipede | Prior+Duel hs | Score | 5,570.2 | — | Unverified |