Variance-Reduced Conservative Policy Iteration

2022-12-12Unverified0· sign in to hype

Naman Agarwal, Brian Bullins, Karan Singh

Unverified — Be the first to reproduce this paper.

Abstract

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a -functional local optimum from O(^-4) to O(^-3). Under state-coverage and policy-completeness assumptions, the algorithm enjoys -global optimality after sampling O(^-2) times, improving upon the previously established O(^-3) sample requirement.

Tasks

reinforcement-learning Reinforcement Learning (RL)

Variance-Reduced Conservative Policy Iteration

Abstract

Tasks

Reproductions