| Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models | Feb 23, 2019 | Decision MakingDialogue Generation | CodeCode Available | 0 |
| Fast Efficient Hyperparameter Tuning for Policy Gradients | Feb 18, 2019 | Meta-LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Diverse Exploration via Conjugate Policies for Policy Gradient Methods | Feb 10, 2019 | Policy Gradient Methods | —Unverified | 0 |
| On-Policy Trust Region Policy Optimisation with Replay Buffers | Jan 18, 2019 | Continuous ControlDeep Reinforcement Learning | CodeCode Available | 0 |
| Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning | Dec 7, 2018 | Distributed ComputingMulti-agent Reinforcement Learning | —Unverified | 0 |
| AdaFrame: Adaptive Frame Selection for Fast Video Recognition | Nov 29, 2018 | Policy Gradient MethodsVideo Recognition | —Unverified | 0 |
| An Off-policy Policy Gradient Theorem Using Emphatic Weightings | Nov 22, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Reward-estimation variance elimination in sequential decision processes | Nov 15, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning | Nov 4, 2018 | DecoderMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement | Oct 22, 2018 | Policy Gradient MethodsQ-Learning | CodeCode Available | 0 |