SOTAVerified

Value Prediction Network

2017-07-11NeurIPS 2017Code Available0· sign in to hype

Junhyuk Oh, Satinder Singh, Honglak Lee

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 AlienVPNScore1,429Unverified
Atari 2600 AmidarVPNScore641Unverified
Atari 2600 Crazy ClimberVPNScore54,119Unverified
Atari 2600 EnduroVPNScore382Unverified
Atari 2600 FrostbiteVPNScore3,811Unverified
Atari 2600 KrullVPNScore15,930Unverified
Atari 2600 Ms. PacmanVPNScore2,689Unverified
Atari 2600 Q*BertVPNScore14,517Unverified
Atari 2600 SeaquestVPNScore5,628Unverified

Reproductions