Deflated Dynamics Value Iteration

2024-07-15Unverified0· sign in to hype

Jongmin Lee, Amin Rakhsha, Ernest K. Ryu, Amir-Massoud Farahmand

Unverified — Be the first to reproduce this paper.

Abstract

The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration k is O(^k), it is slow when the discount factor is close to 1. To accelerate the computation of the value function, we propose Deflated Dynamics Value Iteration (DDVI). DDVI uses matrix splitting and matrix deflation techniques to effectively remove (deflate) the top s dominant eigen-structure of the transition matrix P^. We prove that this leads to a O(^k |_s+1|^k) convergence rate, where _s+1is (s+1)-th largest eigenvalue of the dynamics matrix. We then extend DDVI to the RL setting and present Deflated Dynamics Temporal Difference (DDTD) algorithm. We empirically show the effectiveness of the proposed algorithms.

Tasks

Reinforcement Learning (RL)

Deflated Dynamics Value Iteration

Abstract

Tasks

Reproductions