Expected Sarsa(λ) with Control Variate for Variance Reduction

2019-06-25Unverified0· sign in to hype

Long Yang, Yu Zhang, Jun Wen, Qian Zheng, Pengfei Li, Gang Pan

Unverified — Be the first to reproduce this paper.

Abstract

Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected Sarsa() and propose a tabular ES()-CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES()-CV enjoys a lower variance than Expected Sarsa(). Furthermore, to extend ES()-CV to be a convergent algorithm with linear function approximation, we propose the GES() algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES() achieves O(1/T), which matches or outperforms lots of state-of-art gradient-based algorithms, but we use a more relaxed condition. Numerical experiments show that the proposed algorithm performs better with lower variance than several state-of-art gradient-based TD learning algorithms: GQ(), GTB() and ABQ().

Tasks

Off-policy evaluation Reinforcement Learning

Expected Sarsa(λ) with Control Variate for Variance Reduction

Abstract

Tasks

Reproductions