SOTAVerified

A Distributional Perspective on Reinforcement Learning

2017-07-21ICML 2017Code Available1· sign in to hype

Marc G. Bellemare, Will Dabney, Rémi Munos

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman's equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienC51 noopScore3,166Unverified
Atari 2600 AmidarC51 noopScore1,735Unverified
Atari 2600 AssaultC51 noopScore7,203Unverified
Atari 2600 AsterixC51 noopScore406,211Unverified
Atari 2600 AsteroidsC51 noopScore1,516Unverified
Atari 2600 AtlantisC51 noopScore841,075Unverified
Atari 2600 Bank HeistC51 noopScore976Unverified
Atari 2600 Battle ZoneC51 noopScore28,742Unverified
Atari 2600 Beam RiderC51 noopScore14,074Unverified
Atari 2600 BerzerkC51 noopScore1,645Unverified
Atari 2600 BowlingC51 noopScore81.8Unverified
Atari 2600 BoxingC51 noopScore97.8Unverified
Atari 2600 BreakoutC51 noopScore748Unverified
Atari 2600 CentipedeC51 noopScore9,646Unverified
Atari 2600 Chopper CommandC51 noopScore15,600Unverified
Atari 2600 Crazy ClimberC51 noopScore179,877Unverified
Atari 2600 Demon AttackC51 noopScore130,955Unverified
Atari 2600 Double DunkC51 noopScore2.5Unverified
Atari 2600 EnduroC51 noopScore3,454Unverified
Atari 2600 Fishing DerbyC51 noopScore8.9Unverified
Atari 2600 FreewayC51 noopScore33.9Unverified
Atari 2600 FrostbiteC51 noopScore3,965Unverified
Atari 2600 GopherC51 noopScore33,641Unverified
Atari 2600 GravitarC51 noopScore440Unverified
Atari 2600 HEROC51 noopScore38,874Unverified
Atari 2600 Ice HockeyC51 noopScore-3.5Unverified
Atari 2600 James BondC51 noopScore1,909Unverified
Atari 2600 KangarooC51 noopScore12,853Unverified
Atari 2600 KrullC51 noopScore9,735Unverified
Atari 2600 Kung-Fu MasterC51 noopScore48,192Unverified
Atari 2600 Ms. PacmanC51 noopScore3,415Unverified
Atari 2600 Name This GameC51 noopScore12,542Unverified
Atari 2600 PongC51 noopScore20.9Unverified
Atari 2600 Private EyeC51 noopScore15,095Unverified
Atari 2600 Q*BertC51 noopScore23,784Unverified
Atari 2600 River RaidC51 noopScore17,322Unverified
Atari 2600 Road RunnerC51 noopScore55,839Unverified
Atari 2600 RobotankC51 noopScore52.3Unverified
Atari 2600 SeaquestC51 noopScore266,434Unverified
Atari 2600 Space InvadersC51 noopScore5,747Unverified
Atari 2600 Star GunnerC51 noopScore49,095Unverified
Atari 2600 TennisC51 noopScore23.1Unverified
Atari 2600 Time PilotC51 noopScore8,329Unverified
Atari 2600 TutankhamC51 noopScore280Unverified
Atari 2600 Up and DownC51 noopScore15,612Unverified
Atari 2600 VentureC51 noopScore1,520Unverified
Atari 2600 Video PinballC51 noopScore949,604Unverified
Atari 2600 Wizard of WorC51 noopScore9,300Unverified
Atari 2600 ZaxxonC51 noopScore10,513Unverified

Reproductions