Stochastic Dimension-reduced Second-order Methods for Policy Optimization

2023-01-28Unverified0· sign in to hype

Jinsong Liu, Chenghan Xie, Qi Deng, Dongdong Ge, Yinyu Ye

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an O(^-3.5) complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to O(^-3) based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient methods.

Tasks

Policy Gradient Methods Second-order methods

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

Abstract

Tasks

Reproductions