Stochastic Approximation for Risk-aware Markov Decision Processes

2018-05-11Unverified0· sign in to hype

Wenjie Huang, William B. Haskell

Unverified — Be the first to reproduce this paper.

Abstract

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance >0 for the optimal Q-value estimation gap and learning rate k(1/2,\,1], the overall convergence rate of our algorithm is (((1/)/^2)^1/k+((1/))^1/(1-k)) with probability at least 1-.

Tasks

Q-Learning

Stochastic Approximation for Risk-aware Markov Decision Processes

Abstract

Tasks

Reproductions