SOTAVerified

Deep Reinforcement Learning with Double Q-learning

2015-09-22Code Available1· sign in to hype

Hado van Hasselt, Arthur Guez, David Silver

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienDQN hsScore634Unverified
Atari 2600 AlienDQN noopScore1,620Unverified
Atari 2600 AlienDDQN (tuned) hsScore1,033.4Unverified
Atari 2600 AmidarDQN hsScore178.4Unverified
Atari 2600 AmidarDDQN (tuned) hsScore169.1Unverified
Atari 2600 AmidarDQN noopScore978Unverified
Atari 2600 AssaultDQN noopScore4,280.4Unverified
Atari 2600 AssaultDDQN (tuned) hsScore6,060.8Unverified
Atari 2600 AssaultDQN hsScore3,489.3Unverified
Atari 2600 AsterixDQN hsScore3,170.5Unverified
Atari 2600 AsterixDDQN (tuned) hsScore16,837Unverified
Atari 2600 AsterixDQN noopScore4,359Unverified
Atari 2600 AsteroidsDDQN (tuned) hsScore1,193.2Unverified
Atari 2600 AsteroidsDQN noopScore1,364.5Unverified
Atari 2600 AsteroidsPrior+Duel hsScore1,021.9Unverified
Atari 2600 AsteroidsDQN hsScore1,458.7Unverified
Atari 2600 AtlantisDQN noopScore279,987Unverified
Atari 2600 AtlantisDDQN (tuned) hsScore319,688Unverified
Atari 2600 AtlantisDQN hsScore292,491Unverified
Atari 2600 AtlantisPrior+Duel hsScore423,252Unverified
Atari 2600 Bank HeistPrior+Duel hsScore1,004.6Unverified
Atari 2600 Bank HeistDQN hsScore312.7Unverified
Atari 2600 Bank HeistDDQN (tuned) hsScore886Unverified
Atari 2600 Bank HeistDQN noopScore455Unverified
Atari 2600 Battle ZoneDDQN (tuned) hsScore24,740Unverified
Atari 2600 Battle ZoneDQN hsScore23,750Unverified
Atari 2600 Battle ZonePrior+Duel hsScore30,650Unverified
Atari 2600 Battle ZoneDQN noopScore29,900Unverified
Atari 2600 Beam RiderPrior+Duel hsScore37,412.2Unverified
Atari 2600 Beam RiderDDQN (tuned) hsScore17,417.2Unverified
Atari 2600 Beam RiderDQN hsScore9,743.2Unverified
Atari 2600 Beam RiderDQN noopScore8,627.5Unverified
Atari 2600 BerzerkPrior+Duel hsScore2,178.6Unverified
Atari 2600 BerzerkDQN hsScore493.4Unverified
Atari 2600 BerzerkDDQN (tuned) hsScore1,011.1Unverified
Atari 2600 BerzerkDQN noopScore585.6Unverified
Atari 2600 BowlingDQN noopScore50.4Unverified
Atari 2600 BowlingDQN hsScore56.5Unverified
Atari 2600 BowlingPrior+Duel hsScore50.4Unverified
Atari 2600 BowlingDDQN (tuned) hsScore69.6Unverified
Atari 2600 BoxingPrior+Duel hsScore79.2Unverified
Atari 2600 BoxingDQN hsScore70.3Unverified
Atari 2600 BoxingDQN noopScore88Unverified
Atari 2600 BoxingDDQN (tuned) hsScore73.5Unverified
Atari 2600 BreakoutDQN hsScore354.5Unverified
Atari 2600 BreakoutPrior+Duel hsScore354.6Unverified
Atari 2600 BreakoutDDQN (tuned) hsScore368.9Unverified
Atari 2600 BreakoutDQN noopScore385.5Unverified
Atari 2600 CentipedeDQN noopScore4,657.7Unverified
Atari 2600 CentipedePrior+Duel hsScore5,570.2Unverified

Reproductions