SOTAVerified

S2VG: Soft Stochastic Value Gradient method

2019-09-25Unverified0· sign in to hype

Xiaoyu Tan, Chao Qu, Junwu Xiong, James Zhang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL). Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias. In this paper, we propose a simple and elegant model-based reinforcement learning algorithm called soft stochastic value gradient method (S2VG). S2VG combines the merits of the maximum-entropy reinforcement learning and MBRL, and exploits both real and imaginary data. In particular, we embed the model in the policy training and learn Q and V functions from the real (or imaginary) data set. Such embedding enables us to compute an analytic policy gradient through the back-propagation rather than the likelihood-ratio estimation, which can reduce the variance of the gradient estimation. We name our algorithm Soft Stochastic Value Gradient method to indicate its connection with the well-known stochastic value gradient method in heess2015Learning.

Tasks

Reproductions