Value Prediction Network

2017-07-11NeurIPS 2017Code Available0· sign in to hype

Junhyuk Oh, Satinder Singh, Honglak Lee

Code Available — Be the first to reproduce this paper.

Code

github.com/junhyukoh/value-prediction-network
OfficialIn papertf★ 0
github.com/geohot/twitchcoq
none★ 0

Abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Tasks

Atari Games Deep Reinforcement Learning Prediction Reinforcement Learning Reinforcement Learning (RL)Value prediction

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Atari 2600 Alien	VPN	Score	1,429	—	Unverified
Atari 2600 Amidar	VPN	Score	641	—	Unverified
Atari 2600 Crazy Climber	VPN	Score	54,119	—	Unverified
Atari 2600 Enduro	VPN	Score	382	—	Unverified
Atari 2600 Frostbite	VPN	Score	3,811	—	Unverified
Atari 2600 Krull	VPN	Score	15,930	—	Unverified
Atari 2600 Ms. Pacman	VPN	Score	2,689	—	Unverified
Atari 2600 Q*Bert	VPN	Score	14,517	—	Unverified
Atari 2600 Seaquest	VPN	Score	5,628	—	Unverified

Value Prediction Network

Code

Abstract

Tasks

Benchmark Results

Reproductions