SOTAVerified

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

2019-09-18ICLR 2020Code Available0· sign in to hype

Pan Xu, Felicia Gao, Quanquan Gu

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires O(1/^3/2) episodes to find an -approximate stationary point of the nonconcave performance function J() (i.e., such that \| J()\|_2^2). This sample complexity improves the existing result O(1/^5/3) for stochastic variance reduced policy gradient algorithms by a factor of O(1/^1/6). In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

Tasks

Reproductions