Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

2015-09-17Unverified0· sign in to hype

Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

Unverified — Be the first to reproduce this paper.

Abstract

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm SuttonMW15, which encompasses the original ETD(), as well as several other off-policy evaluation algorithms as special cases. We call this framework , where our introduced parameter controls the decay rate of an importance-sampling term. We study conditions under which the projected fixed-point equation underlying \ involves a contraction operator, allowing us to present the first asymptotic error bounds (bias) for . Our results show that the original ETD algorithm always involves a contraction operator, and its bias is bounded. Moreover, by controlling , our proposed generalization allows trading-off bias for variance reduction, thereby achieving a lower total error.

Tasks

Off-policy evaluation

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Abstract

Tasks

Reproductions