SOTAVerified

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

2019-09-13Unverified0· sign in to hype

Wesley Cowan, Michael N. Katehakis, Daniel Pirutinsky

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of bkmdp97 with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

Tasks

Reproductions