The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

2021-10-27Unverified0· sign in to hype

Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn

Unverified — Be the first to reproduce this paper.

Abstract

The paper concerns the d-dimensional stochastic approximation recursion, where \ _n \ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in L_4. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance E[ z_n z_n^T ] to the asymptotic covariance in the CLT, where z_n =: (_n-^*)/_n. (iii) The CLT holds for the normalized version z^PR_n =: n [^PR_n -^*], of the averaged parameters ^PR_n =:n^-1 _k=1^n_k, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. (iv) An example is given where f and f are linear in , and is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of _n is unbounded and in fact diverges. This arXiv version represents a major extension of the results in prior versions.The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

Tasks

reinforcement-learning Reinforcement Learning (RL)

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Abstract

Tasks

Reproductions