| Policy Optimization by Genetic Distillation | Nov 3, 2017 | Deep Reinforcement LearningImitation Learning | —Unverified | 0 |
| Action-depedent Control Variates for Policy Optimization via Stein's Identity | Oct 30, 2017 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Understanding Early Word Learning in Situated Artificial Agents | Oct 26, 2017 | Grounded language learningPolicy Gradient Methods | —Unverified | 0 |
| Accelerated Reinforcement Learning | Oct 23, 2017 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Stochastic Variance Reduction for Policy Gradient Estimation | Oct 17, 2017 | continuous-controlContinuous Control | —Unverified | 0 |
| Manifold Regularization for Kernelized LSTD | Oct 15, 2017 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Cold-Start Reinforcement Learning with Softmax Policy Gradient | Sep 27, 2017 | Image CaptioningPolicy Gradient Methods | CodeCode Available | 0 |
| Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control | Aug 10, 2017 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Proximal Policy Optimization Algorithms | Jul 20, 2017 | Continuous ControlDota 2 | CodeCode Available | 2 |
| Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines | Jun 20, 2017 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| A unified view of entropy-regularized Markov decision processes | May 22, 2017 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Equivalence Between Policy Gradients and Soft Q-Learning | Apr 21, 2017 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Stein Variational Policy Gradient | Apr 7, 2017 | Bayesian Inferencecontinuous-control | —Unverified | 0 |
| Batch Policy Gradient Methods for Improving Neural Conversation Models | Feb 10, 2017 | ChatbotPolicy Gradient Methods | —Unverified | 0 |
| A K-fold Method for Baseline Estimation in Policy Gradient Algorithms | Jan 3, 2017 | MuJoCoPolicy Gradient Methods | —Unverified | 0 |
| Sample-efficient Deep Reinforcement Learning for Dialog Control | Dec 18, 2016 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Self-critical Sequence Training for Image Captioning | Dec 2, 2016 | Image CaptioningPolicy Gradient Methods | CodeCode Available | 1 |
| Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic | Nov 7, 2016 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Dual Learning for Machine Translation | Nov 1, 2016 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Deep Reinforcement Learning for Dialogue Generation | Jun 5, 2016 | ChatbotDeep Reinforcement Learning | CodeCode Available | 0 |
| Policy Gradient Methods for Off-policy Control | Dec 13, 2015 | Policy Gradient Methods | —Unverified | 0 |
| High-Dimensional Continuous Control Using Generalized Advantage Estimation | Jun 8, 2015 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Trust Region Policy Optimization | Feb 19, 2015 | Atari GamesPolicy Gradient Methods | CodeCode Available | 1 |
| Policy Gradient for Coherent Risk Measures | Feb 13, 2015 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Efficient Baseline-free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE | Dec 13, 2013 | Policy Gradient Methods | —Unverified | 0 |
| Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result | Dec 1, 2013 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Adaptive Step-Size for Policy Gradient Methods | Dec 1, 2013 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| A reinterpretation of the policy oscillation phenomenon in approximate policy iteration | Dec 1, 2011 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Analysis and Improvement of Policy Gradient Estimation | Dec 1, 2011 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient | Dec 1, 2010 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks | Dec 1, 2010 | Policy Gradient Methods | —Unverified | 0 |
| Policy Search for Motor Primitives in Robotics | Dec 1, 2008 | Imitation LearningPolicy Gradient Methods | —Unverified | 0 |