Learnings Options End-to-End for Continuous Action Tasks
2017-11-30Code Available1· sign in to hype
Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/mklissa/PPOCOfficialIn papertf★ 0
- github.com/kkhetarpal/ioctf★ 25
- github.com/shuishida/soaprlpytorch★ 2
Abstract
We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.