SOTAVerified

Distributional Reinforcement Learning for Risk-Sensitive Policies

2021-01-01Unverified0· sign in to hype

Shiau Hong Lim, Ilyas Malik

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We address the problem of learning a risk-sensitive policy based on the CVaR risk measure using distributional reinforcement learning. In particular, we show that applying the distributional Bellman optimality operator with respect to a risk-based action-selection strategy overestimates the dynamic, Markovian CVaR. The resulting policies can however still be overly conservative and one often prefers to learn an optimal policy based on the static, non-Markovian CVaR. To this end, we propose a modification to the existing algorithm and show that it can indeed learn a proper CVaR-optimized policy. Our proposed approach is a simple extension of standard distributional RL algorithms and can therefore take advantage of many of the recent advances in deep RL. On both synthetic and real data, we empirically show that our proposed algorithm is able to produce a family of risk-averse policies that achieves a better tradeoff between risk and the expected return.

Tasks

Reproductions