Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/labmlai/annotated_deep_learning_paper_implementationspytorch★ 66,103
- github.com/facebookresearch/ReAgentpytorch★ 3,690
- github.com/facebookresearch/Horizonpytorch★ 3,686
- github.com/opendilab/DI-enginepytorch★ 3,606
- github.com/tensorlayer/RLzootf★ 644
- github.com/wtingda/DeepRLBreakouttf★ 6
- github.com/xusophia/DataSciFinalProjpytorch★ 4
- github.com/prajwalgatti/DRL-Continuous-Controlnone★ 1
- github.com/la3lma/Cheztf★ 1
- github.com/la3lma/chezjuliatf★ 1
Abstract
In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Atari 2600 Alien | Prior+Duel hs | Score | 823.7 | — | Unverified |
| Atari 2600 Alien | Prior+Duel noop | Score | 3,941 | — | Unverified |
| Atari 2600 Alien | Duel hs | Score | 1,486.5 | — | Unverified |
| Atari 2600 Alien | DDQN (tuned) noop | Score | 3,747.7 | — | Unverified |
| Atari 2600 Alien | Duel noop | Score | 4,461.4 | — | Unverified |
| Atari 2600 Amidar | Duel hs | Score | 172.7 | — | Unverified |
| Atari 2600 Amidar | Duel noop | Score | 2,354.5 | — | Unverified |
| Atari 2600 Amidar | DDQN (tuned) noop | Score | 1,793.3 | — | Unverified |
| Atari 2600 Amidar | Prior+Duel hs | Score | 238.4 | — | Unverified |
| Atari 2600 Amidar | Prior+Duel noop | Score | 2,296.8 | — | Unverified |
| Atari 2600 Assault | Duel noop | Score | 4,621 | — | Unverified |
| Atari 2600 Assault | Duel hs | Score | 3,994.8 | — | Unverified |
| Atari 2600 Assault | Prior+Duel hs | Score | 10,950.6 | — | Unverified |
| Atari 2600 Assault | DDQN (tuned) noop | Score | 5,393.2 | — | Unverified |
| Atari 2600 Assault | Prior+Duel noop | Score | 11,477 | — | Unverified |
| Atari 2600 Asterix | DDQN (tuned) noop | Score | 17,356.5 | — | Unverified |
| Atari 2600 Asterix | Duel noop | Score | 28,188 | — | Unverified |
| Atari 2600 Asterix | Prior+Duel noop | Score | 375,080 | — | Unverified |
| Atari 2600 Asterix | Prior+Duel hs | Score | 364,200 | — | Unverified |
| Atari 2600 Asterix | Duel hs | Score | 15,840 | — | Unverified |
| Atari 2600 Asteroids | DDQN (tuned) noop | Score | 734.7 | — | Unverified |
| Atari 2600 Asteroids | Duel noop | Score | 2,837.7 | — | Unverified |
| Atari 2600 Asteroids | Duel hs | Score | 2,035.4 | — | Unverified |
| Atari 2600 Asteroids | Prior+Duel noop | Score | 1,192.7 | — | Unverified |
| Atari 2600 Atlantis | Duel hs | Score | 445,360 | — | Unverified |
| Atari 2600 Atlantis | Prior+Duel noop | Score | 395,762 | — | Unverified |
| Atari 2600 Atlantis | Duel noop | Score | 382,572 | — | Unverified |
| Atari 2600 Atlantis | DDQN (tuned) noop | Score | 106,056 | — | Unverified |
| Atari 2600 Bank Heist | DDQN (tuned) noop | Score | 1,030.6 | — | Unverified |
| Atari 2600 Bank Heist | Duel hs | Score | 1,129.3 | — | Unverified |
| Atari 2600 Bank Heist | Prior+Duel noop | Score | 1,503.1 | — | Unverified |
| Atari 2600 Bank Heist | Duel noop | Score | 1,611.9 | — | Unverified |
| Atari 2600 Battle Zone | DDQN (tuned) noop | Score | 31,700 | — | Unverified |
| Atari 2600 Battle Zone | Prior+Duel noop | Score | 35,520 | — | Unverified |
| Atari 2600 Battle Zone | Duel hs | Score | 31,320 | — | Unverified |
| Atari 2600 Battle Zone | Duel noop | Score | 37,150 | — | Unverified |
| Atari 2600 Beam Rider | Prior+Duel noop | Score | 30,276.5 | — | Unverified |
| Atari 2600 Beam Rider | DDQN (tuned) noop | Score | 13,772.8 | — | Unverified |
| Atari 2600 Beam Rider | Duel noop | Score | 12,164 | — | Unverified |
| Atari 2600 Beam Rider | Duel hs | Score | 14,591.3 | — | Unverified |
| Atari 2600 Berzerk | Prior+Duel noop | Score | 3,409 | — | Unverified |
| Atari 2600 Berzerk | Duel noop | Score | 1,472.6 | — | Unverified |
| Atari 2600 Berzerk | DDQN (tuned) noop | Score | 1,225.4 | — | Unverified |
| Atari 2600 Berzerk | Duel hs | Score | 910.6 | — | Unverified |
| Atari 2600 Bowling | Duel noop | Score | 65.5 | — | Unverified |
| Atari 2600 Bowling | DDQN (tuned) noop | Score | 68.1 | — | Unverified |
| Atari 2600 Bowling | Duel hs | Score | 65.7 | — | Unverified |
| Atari 2600 Bowling | Prior+Duel noop | Score | 46.7 | — | Unverified |
| Atari 2600 Boxing | Duel noop | Score | 99.4 | — | Unverified |
| Atari 2600 Boxing | Duel hs | Score | 77.3 | — | Unverified |