SOTAVerified

Generalized Data Distribution Iteration

2022-06-07Unverified0· sign in to hype

Jiajun Fan, Changnan Xiao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

To obtain higher sample efficiency and superior final performance simultaneously has been one of the major challenges for deep reinforcement learning (DRL). Previous work could handle one of these challenges but typically failed to address them concurrently. In this paper, we try to tackle these two challenges simultaneously. To achieve this, we firstly decouple these challenges into two classic RL problems: data richness and exploration-exploitation trade-off. Then, we cast these two problems into the training data distribution optimization problem, namely to obtain desired training data within limited interactions, and address them concurrently via i) explicit modeling and control of the capacity and diversity of behavior policy and ii) more fine-grained and adaptive control of selective/sampling distribution of the behavior policy using a monotonic data distribution optimization. Finally, we integrate this process into Generalized Policy Iteration (GPI) and obtain a more general framework called Generalized Data Distribution Iteration (GDI). We use the GDI framework to introduce operator-based versions of well-known RL methods from DQN to Agent57. Theoretical guarantee of the superiority of GDI compared with GPI is concluded. We also demonstrate our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.33% mean human normalized score (HNS), 1146.39% median HNS and surpassed 22 human world records using only 200M training frames. Our performance is comparable to Agent57's while we consume 500 times less data. We argue that there is still a long way to go before obtaining real superhuman agents in ALE.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienGDI-I3Score43,384Unverified
Atari 2600 AlienGDI-H3Score48,735Unverified
Atari 2600 AmidarGDI-I3Score1,442Unverified
Atari 2600 AmidarGDI-H3Score1,065Unverified
Atari 2600 AssaultGDI-H3Score97,155Unverified
Atari 2600 AssaultGDI-I3Score63,876Unverified
Atari 2600 AsterixGDI-I3Score759,910Unverified
Atari 2600 AsterixGDI-H3Score999,999Unverified
Atari 2600 AsteroidsGDI-I3Score751,970Unverified
Atari 2600 AsteroidsGDI-H3Score760,005Unverified
Atari 2600 AtlantisGDI-I3Score3,803,000Unverified
Atari 2600 AtlantisGDI-H3Score3,837,300Unverified
Atari 2600 Bank HeistGDI-I3Score1,401Unverified
Atari 2600 Bank HeistGDI-H3Score1,380Unverified
Atari 2600 Battle ZoneGDI-H3Score824,360Unverified
Atari 2600 Battle ZoneGDI-I3Score478,830Unverified
Atari 2600 Beam RiderGDI-H3Score422,890Unverified
Atari 2600 Beam RiderGDI-I3Score162,100Unverified
Atari 2600 BerzerkGDI-I3Score7,607Unverified
Atari 2600 BerzerkGDI-H3Score14,649Unverified
Atari 2600 BowlingGDI-H3Score205.2Unverified
Atari 2600 BowlingGDI-I3Score201.9Unverified
Atari 2600 BoxingGDI-I3Score100Unverified
Atari 2600 BoxingGDI-H3Score100Unverified
Atari 2600 BreakoutGDI-H3Score864Unverified
Atari 2600 BreakoutGDI-I3Score864Unverified
Atari 2600 BreakoutGDI-I3(200M frames)Score864Unverified
Atari 2600 BreakoutGDI-H3(200M frames)Score864Unverified
Atari 2600 CentipedeGDI-I3Score155,830Unverified
Atari 2600 CentipedeGDI-H3Score195,630Unverified
Atari 2600 Chopper CommandGDI-I3Score999,999Unverified
Atari 2600 Chopper CommandGDI-H3Score999,999Unverified
Atari 2600 Crazy ClimberGDI-I3Score201,000Unverified
Atari 2600 Crazy ClimberGDI-H3Score241,170Unverified
Atari 2600 DefenderGDI-H3Score970,540Unverified
Atari 2600 DefenderGDI-I3Score893,110Unverified
Atari 2600 Demon AttackGDI-I3Score675,530Unverified
Atari 2600 Demon AttackGDI-H3Score787,985Unverified
Atari 2600 Double DunkGDI-H3Score24Unverified
Atari 2600 Double DunkGDI-I3Score24Unverified
Atari 2600 EnduroGDI-H3Score14,300Unverified
Atari 2600 EnduroGDI-I3Score14,330Unverified
Atari 2600 Fishing DerbyGDI-I3Score59Unverified
Atari 2600 Fishing DerbyGDI-H3Score65Unverified
Atari 2600 FreewayGDI-H3(200M frames)Score34Unverified
Atari 2600 FreewayGDI-H3Score34Unverified
Atari 2600 FrostbiteGDI-H3Score11,330Unverified
Atari 2600 FrostbiteGDI-H3(200M frames)Score11,330Unverified
Atari 2600 GopherGDI-H3Score473,560Unverified
Atari 2600 GopherGDI-I3Score488,830Unverified

Reproductions