SOTAVerified

Discrete-to-Deep Supervised Policy Learning

2020-05-05Code Available0· sign in to hype

Budi Kurniawan, Peter Vamplew, Michael Papasimeon, Richard Dazeley, Cameron Foale

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.

Tasks

Reproductions