Value Prediction Network
Junhyuk Oh, Satinder Singh, Honglak Lee
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/junhyukoh/value-prediction-networkOfficialIn papertf★ 0
- github.com/geohot/twitchcoqnone★ 0
Abstract
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Atari 2600 Alien | VPN | Score | 1,429 | — | Unverified |
| Atari 2600 Amidar | VPN | Score | 641 | — | Unverified |
| Atari 2600 Crazy Climber | VPN | Score | 54,119 | — | Unverified |
| Atari 2600 Enduro | VPN | Score | 382 | — | Unverified |
| Atari 2600 Frostbite | VPN | Score | 3,811 | — | Unverified |
| Atari 2600 Krull | VPN | Score | 15,930 | — | Unverified |
| Atari 2600 Ms. Pacman | VPN | Score | 2,689 | — | Unverified |
| Atari 2600 Q*Bert | VPN | Score | 14,517 | — | Unverified |
| Atari 2600 Seaquest | VPN | Score | 5,628 | — | Unverified |