SOTAVerified

On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures

2023-01-26Unverified0· sign in to hype

Xian Yu, Lei Ying

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Risk-sensitive reinforcement learning (RL) has become a popular tool for controlling the risk of uncertain outcomes and ensuring reliable performance in highly stochastic sequential decision-making problems. While Policy Gradient (PG) methods have been developed for risk-sensitive RL, it remains unclear if these methods enjoy the same global convergence guarantees as in the risk-neutral case mei2020global,agarwal2021theory,cen2022fast,bhandari2024global. In this paper, we consider a class of dynamic time-consistent risk measures, named Expected Conditional Risk Measures (ECRMs), and derive PG and Natural Policy Gradient (NPG) updates for ECRMs-based RL problems. We provide global optimality and iteration complexities of the proposed algorithms under the following four settings: (i) PG with constrained direct parameterization, (ii) PG with softmax parameterization and log barrier regularization, (iii) NPG with softmax parameterization and entropy regularization, and (iv) approximate NPG with inexact policy evaluation. Furthermore, we test a risk-averse REINFORCE algorithm williams1992simple and a risk-averse NPG algorithm kakade2001natural on a stochastic Cliffwalk environment to demonstrate the efficacy of our methods and the importance of risk control.

Tasks

Reproductions