SOTAVerified

DNA: Proximal Policy Optimization with a Dual Network Architecture

2022-06-20Code Available1· sign in to hype

Matthew Aitchison, Penny Sweetser

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper explores the problem of simultaneously learning a value function and policy in deep actor-critic reinforcement learning models. We find that the common practice of learning these functions jointly is sub-optimal, due to an order-of-magnitude difference in noise levels between these two tasks. Instead, we show that learning these tasks independently, but with a constrained distillation phase, significantly improves performance. Furthermore, we find that the policy gradient noise levels can be decreased by using a lower variance return estimate. Whereas, the value learning noise level decreases with a lower bias estimate. Together these insights inform an extension to Proximal Policy Optimization we call Dual Network Architecture (DNA), which significantly outperforms its predecessor. DNA also exceeds the performance of the popular Rainbow DQN algorithm on four of the five environments tested, even under more difficult stochastic control settings.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienDNAScore5,021Unverified
Atari 2600 AmidarDNAScore1,025Unverified
Atari 2600 AssaultDNAScore16,293Unverified
Atari 2600 AsterixDNAScore323,965Unverified
Atari 2600 AsteroidsDNAScore165,973Unverified
Atari 2600 AtlantisDNAScore932,559Unverified
Atari 2600 Bank HeistDNAScore1,286Unverified
Atari 2600 Battle ZoneDNAScore71,003Unverified
Atari 2600 Beam RiderDNAScore20,393Unverified
Atari 2600 BerzerkDNAScore19,789Unverified
Atari 2600 BowlingDNAScore181Unverified
Atari 2600 BoxingDNAScore99.9Unverified
Atari 2600 BreakoutDNAScore626Unverified
Atari 2600 CentipedeDNAScore100,194Unverified
Atari 2600 Chopper CommandDNAScore31,181Unverified
Atari 2600 Crazy ClimberDNAScore131,623Unverified
Atari 2600 DefenderDNAScore152,768Unverified
Atari 2600 Demon AttackDNAScore97,909Unverified
Atari 2600 Double DunkDNAScore-1.3Unverified
Atari 2600 EnduroDNAScore2,059Unverified
Atari 2600 Fishing DerbyDNAScore57.4Unverified
Atari 2600 FreewayDNAScore33Unverified
Atari 2600 FrostbiteDNAScore320Unverified
Atari 2600 GopherDNAScore80,104Unverified
Atari 2600 GravitarDNAScore2,190Unverified
Atari 2600 HERODNAScore24,904Unverified
Atari 2600 Ice HockeyDNAScore7.2Unverified
Atari 2600 James BondDNAScore14,102Unverified
Atari 2600 KangarooDNAScore14,373Unverified
Atari 2600 KrullDNAScore10,956Unverified
Atari 2600 Kung-Fu MasterDNAScore110,962Unverified
Atari 2600 Montezuma's RevengeDNAScore0Unverified
Atari 2600 Ms. PacmanDNAScore5,894Unverified
Atari 2600 Name This GameDNAScore20,226Unverified
Atari 2600 PhoenixDNAScore391,085Unverified
Atari 2600 Pitfall!DNAScore0Unverified
Atari 2600 PongDNAScore19.7Unverified
Atari 2600 Private EyeDNAScore100Unverified
Atari 2600 Q*BertDNAScore52,398Unverified
Atari 2600 River RaidDNAScore16,789Unverified
Atari 2600 Road RunnerDNAScore61,713Unverified
Atari 2600 RobotankDNAScore64.8Unverified
Atari 2600 SeaquestDNAScore4,146Unverified
Atari 2600 SkiingDNAScore-29,974Unverified
Atari 2600 SolarisDNAScore2,225Unverified
Atari 2600 Space InvadersDNAScore2,731Unverified
Atari 2600 Star GunnerDNAScore104,125Unverified
Atari 2600 SurroundDNAScore5.3Unverified
Atari 2600 TennisDNAScore-10.9Unverified
Atari 2600 Time PilotDNAScore12,774Unverified

Reproductions