SOTAVerified

A Non-Parametric Approach to Dynamic Programming

2011-12-01NeurIPS 2011Unverified0· sign in to hype

Oliver B. Kroemer, Jan R. Peters

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we consider the problem of policy evaluation for continuous-state systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin's method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.

Tasks

Reproductions