| Learning Self-Imitating Diverse Policies | May 25, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Multiagent Soft Q-Learning | Apr 25, 2018 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| On Learning Intrinsic Rewards for Policy Gradient Methods | Apr 17, 2018 | Atari GamesDecision Making | CodeCode Available | 0 |
| Information Maximizing Exploration with a Latent Dynamics Model | Apr 4, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines | Mar 20, 2018 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| The Mirage of Action-Dependent Baselines in Reinforcement Learning | Feb 27, 2018 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Optimizing over a Restricted Policy Class in Markov Decision Processes | Feb 26, 2018 | Policy Gradient Methods | —Unverified | 0 |
| Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning | Feb 22, 2018 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Clipped Action Policy Gradient | Feb 21, 2018 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Policy Gradients for Contextual Recommendations | Feb 12, 2018 | Decision MakingMulti-Armed Bandits | —Unverified | 0 |