Lifelong Learning of Factored Policies via Policy Gradients

2020-06-12ICML Workshop LifelongML 2020Unverified0· sign in to hype

Jorge A Mendez, Eric Eaton

Unverified — Be the first to reproduce this paper.

Abstract

Policy gradient methods have shown success in learning continuous control policies for high-dimensional dynamical systems. A major downside of such methods is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.

Tasks

continuous-control Continuous Control Lifelong learning Policy Gradient Methods

Lifelong Learning of Factored Policies via Policy Gradients

Abstract

Tasks

Reproductions