Scalable First-Order Methods for Robust MDPs
Julien Grand-Clément, Christian Kroer
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of O ( A^2 S^3(S)(^-1) ^-1 ) for the best choice of parameters on ellipsoidal and Kullback-Leibler s-rectangular uncertainty sets, where S and A is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of O(A^1.5S^1.5)) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with s-rectangular KL uncertainty sets.