On the connection between Bregman divergence and value in regularized Markov decision processes

2022-10-21Unverified0· sign in to hype

Brendan O'Donoghue

Unverified — Be the first to reproduce this paper.

Abstract

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

Tasks

reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

On the connection between Bregman divergence and value in regularized Markov decision processes

Abstract

Tasks

Reproductions