Emphatic TD Bellman Operator is a Contraction

2015-08-14Unverified0· sign in to hype

Assaf Hallak, Aviv Tamar, Shie Mannor

Unverified — Be the first to reproduce this paper.

Abstract

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a -contraction modulus (where is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.

Tasks

Off-policy evaluation

Emphatic TD Bellman Operator is a Contraction

Abstract

Tasks

Reproductions