SOTAVerified

Increasing the Action Gap: New Operators for Reinforcement Learning

2015-12-15Code Available0· sign in to hype

Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienPersistent ALScore5,699.81Unverified
Atari 2600 AlienAdvantage LearningScore4,990.91Unverified
Atari 2600 AmidarPersistent ALScore1,451.65Unverified
Atari 2600 AmidarAdvantage LearningScore1,557.43Unverified
Atari 2600 AssaultPersistent ALScore3,304.33Unverified
Atari 2600 AssaultAdvantage LearningScore3,661.51Unverified
Atari 2600 AsterixAdvantage LearningScore12,852.08Unverified
Atari 2600 AsterixPersistent ALScore19,564.9Unverified
Atari 2600 AsteroidsPersistent ALScore1,673.52Unverified
Atari 2600 AsteroidsAdvantage LearningScore1,924.42Unverified
Atari 2600 AtlantisAdvantage LearningScore553,591.67Unverified
Atari 2600 AtlantisPersistent ALScore1,465,250Unverified
Atari 2600 Bank HeistAdvantage LearningScore633.63Unverified
Atari 2600 Bank HeistPersistent ALScore874.99Unverified
Atari 2600 Battle ZoneAdvantage LearningScore28,789.29Unverified
Atari 2600 Battle ZonePersistent ALScore34,583.07Unverified
Atari 2600 Beam RiderPersistent ALScore13,145.34Unverified
Atari 2600 Beam RiderAdvantage LearningScore10,054.58Unverified
Atari 2600 BerzerkPersistent ALScore1,328.25Unverified
Atari 2600 BerzerkAdvantage LearningScore747.26Unverified
Atari 2600 BowlingAdvantage LearningScore57.41Unverified
Atari 2600 BowlingPersistent ALScore71.59Unverified
Atari 2600 BoxingPersistent ALScore94.3Unverified
Atari 2600 BoxingAdvantage LearningScore93.94Unverified
Atari 2600 BreakoutAdvantage LearningScore425.32Unverified
Atari 2600 BreakoutPersistent ALScore431.89Unverified
Atari 2600 CentipedePersistent ALScore4,539.55Unverified
Atari 2600 CentipedeAdvantage LearningScore4,225.18Unverified
Atari 2600 Chopper CommandAdvantage LearningScore5,431.36Unverified
Atari 2600 Chopper CommandPersistent ALScore5,734.93Unverified
Atari 2600 Crazy ClimberAdvantage LearningScore123,410.71Unverified
Atari 2600 Crazy ClimberPersistent ALScore130,002.71Unverified
Atari 2600 DefenderAdvantage LearningScore30,643.59Unverified
Atari 2600 DefenderPersistent ALScore32,038.93Unverified
Atari 2600 Demon AttackAdvantage LearningScore27,153.48Unverified
Atari 2600 Demon AttackPersistent ALScore70,908.17Unverified
Atari 2600 Double DunkPersistent ALScore-2.51Unverified
Atari 2600 Double DunkAdvantage LearningScore-0.15Unverified
Atari 2600 Elevator ActionPersistent ALScore29,100Unverified
Atari 2600 Elevator ActionAdvantage LearningScore27,088.89Unverified
Atari 2600 EnduroAdvantage LearningScore1,252.7Unverified
Atari 2600 EnduroPersistent ALScore1,343.1Unverified
Atari 2600 Fishing DerbyPersistent ALScore28.13Unverified
Atari 2600 Fishing DerbyAdvantage LearningScore21.32Unverified
Atari 2600 FreewayPersistent ALScore32.3Unverified
Atari 2600 FreewayAdvantage LearningScore31.72Unverified
Atari 2600 FrostbiteAdvantage LearningScore2,305.82Unverified
Atari 2600 FrostbitePersistent ALScore3,248.96Unverified
Atari 2600 GopherAdvantage LearningScore11,912.68Unverified
Atari 2600 GopherPersistent ALScore10,611.81Unverified

Reproductions