Concentration bounds for SSP Q-learning for average cost MDPs

2022-06-07Unverified0· sign in to hype

Shaan ul Haque, Vivek Borkar

Unverified — Be the first to reproduce this paper.

Abstract

We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.

Tasks

Q-Learning

Concentration bounds for SSP Q-learning for average cost MDPs

Abstract

Tasks

Reproductions