SOTAVerified

Dueling Network Architectures for Deep Reinforcement Learning

2015-11-20Code Available0· sign in to hype

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienPrior+Duel hsScore823.7Unverified
Atari 2600 AlienPrior+Duel noopScore3,941Unverified
Atari 2600 AlienDuel hsScore1,486.5Unverified
Atari 2600 AlienDDQN (tuned) noopScore3,747.7Unverified
Atari 2600 AlienDuel noopScore4,461.4Unverified
Atari 2600 AmidarDuel hsScore172.7Unverified
Atari 2600 AmidarDuel noopScore2,354.5Unverified
Atari 2600 AmidarDDQN (tuned) noopScore1,793.3Unverified
Atari 2600 AmidarPrior+Duel hsScore238.4Unverified
Atari 2600 AmidarPrior+Duel noopScore2,296.8Unverified
Atari 2600 AssaultDuel noopScore4,621Unverified
Atari 2600 AssaultDuel hsScore3,994.8Unverified
Atari 2600 AssaultPrior+Duel hsScore10,950.6Unverified
Atari 2600 AssaultDDQN (tuned) noopScore5,393.2Unverified
Atari 2600 AssaultPrior+Duel noopScore11,477Unverified
Atari 2600 AsterixDDQN (tuned) noopScore17,356.5Unverified
Atari 2600 AsterixDuel noopScore28,188Unverified
Atari 2600 AsterixPrior+Duel noopScore375,080Unverified
Atari 2600 AsterixPrior+Duel hsScore364,200Unverified
Atari 2600 AsterixDuel hsScore15,840Unverified
Atari 2600 AsteroidsDDQN (tuned) noopScore734.7Unverified
Atari 2600 AsteroidsDuel noopScore2,837.7Unverified
Atari 2600 AsteroidsDuel hsScore2,035.4Unverified
Atari 2600 AsteroidsPrior+Duel noopScore1,192.7Unverified
Atari 2600 AtlantisDuel hsScore445,360Unverified
Atari 2600 AtlantisPrior+Duel noopScore395,762Unverified
Atari 2600 AtlantisDuel noopScore382,572Unverified
Atari 2600 AtlantisDDQN (tuned) noopScore106,056Unverified
Atari 2600 Bank HeistDDQN (tuned) noopScore1,030.6Unverified
Atari 2600 Bank HeistDuel hsScore1,129.3Unverified
Atari 2600 Bank HeistPrior+Duel noopScore1,503.1Unverified
Atari 2600 Bank HeistDuel noopScore1,611.9Unverified
Atari 2600 Battle ZoneDDQN (tuned) noopScore31,700Unverified
Atari 2600 Battle ZonePrior+Duel noopScore35,520Unverified
Atari 2600 Battle ZoneDuel hsScore31,320Unverified
Atari 2600 Battle ZoneDuel noopScore37,150Unverified
Atari 2600 Beam RiderPrior+Duel noopScore30,276.5Unverified
Atari 2600 Beam RiderDDQN (tuned) noopScore13,772.8Unverified
Atari 2600 Beam RiderDuel noopScore12,164Unverified
Atari 2600 Beam RiderDuel hsScore14,591.3Unverified
Atari 2600 BerzerkPrior+Duel noopScore3,409Unverified
Atari 2600 BerzerkDuel noopScore1,472.6Unverified
Atari 2600 BerzerkDDQN (tuned) noopScore1,225.4Unverified
Atari 2600 BerzerkDuel hsScore910.6Unverified
Atari 2600 BowlingDuel noopScore65.5Unverified
Atari 2600 BowlingDDQN (tuned) noopScore68.1Unverified
Atari 2600 BowlingDuel hsScore65.7Unverified
Atari 2600 BowlingPrior+Duel noopScore46.7Unverified
Atari 2600 BoxingDuel noopScore99.4Unverified
Atari 2600 BoxingDuel hsScore77.3Unverified

Reproductions