Robust Policy Optimization in Continuous-time Mixed H_2/H_ Stochastic Control

2022-09-09Unverified0· sign in to hype

Leilei Cui, Lekan Molu

Unverified — Be the first to reproduce this paper.

Abstract

Following the recent resurgence in establishing linear control theoretic benchmarks for reinforcement leaning (RL)-based policy optimization (PO) for complex dynamical systems with continuous state and action spaces, an optimal control problem for a continuous-time infinite-dimensional linear stochastic system possessing additive Brownian motion is optimized on a cost that is an exponent of the quadratic form of the state, input, and disturbance terms. We lay out a model-based and model-free algorithm for RL-based stochastic PO. For the model-based algorithm, we establish rigorous convergence guarantees. For the sampling-based algorithm, over trajectory arcs that emanate from the phase space, we find that the Hamilton-Jacobi Bellman equation parameterizes trajectory costs -- resulting in a discrete-time (input and state-based) sampling scheme accompanied by unknown nonlinear dynamics with continuous-time policy iterates. The need for known dynamics operators is circumvented and we arrive at a reinforced PO algorithm (via policy iteration) where an upper bound on the H_2 norm is minimized (to guarantee stability) and a robustness metric is enforced by maximizing the cost with respect to a controller that includes the level of noise attenuation specified by the system's H_ norm. Rigorous robustness analyses is prescribed in an input-to-state stability formalism. Our analyses and contributions are distinguished by many natural systems characterized by additive Wiener process, amenable to \^Ito's stochastic differential calculus in dynamic game settings.

Tasks

reinforcement-learning Reinforcement Learning (RL)

Robust Policy Optimization in Continuous-time Mixed H_2/H_ Stochastic Control

Abstract

Tasks

Reproductions