Emphatic TD Bellman Operator is a Contraction
2015-08-14Unverified0· sign in to hype
Assaf Hallak, Aviv Tamar, Shie Mannor
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a -contraction modulus (where is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.