Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces
Craig J. Bester, Steven D. James, George D. Konidaris
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/cycraig/MP-DQNOfficialIn paperpytorch★ 0
- github.com/cycraig/gym-goalnone★ 0
- github.com/cycraig/gym-platformnone★ 0
- github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.pypytorch★ 0
Abstract
Parameterised actions in reinforcement learning are composed of discrete actions with continuous action-parameters. This provides a framework for solving complex domains that require combining high-level actions with flexible control. The recent P-DQN algorithm extends deep Q-networks to learn over such action spaces. However, it treats all action-parameters as a single joint input to the Q-network, invalidating its theoretical foundations. We analyse the issues with this approach and propose a novel method, multi-pass deep Q-networks, or MP-DQN, to address them. We empirically demonstrate that MP-DQN significantly outperforms P-DQN and other previous algorithms in terms of data efficiency and converged policy performance on the Platform, Robot Soccer Goal, and Half Field Offense domains.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Half Field Offence | MP-DQN | Goal Probability | 0.91 | — | Unverified |
| Platform | MP-DQN | Return | 0.99 | — | Unverified |
| Robot Soccer Goal | MP-DQN | Goal Probability | 0.79 | — | Unverified |