Natural Policy Gradients In Reinforcement Learning Explained

2022-09-05Unverified0· sign in to hype

W. J. A. van Heeswijk

Unverified — Be the first to reproduce this paper.

Abstract

Traditional policy gradient methods are fundamentally flawed. Natural gradients converge quicker and better, forming the foundation of contemporary Reinforcement Learning such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). This lecture note aims to clarify the intuition behind natural policy gradients, focusing on the thought process and the key mathematical constructs.

Tasks

Policy Gradient Methods reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Natural Policy Gradients In Reinforcement Learning Explained

Abstract

Tasks

Reproductions