| A unified view of entropy-regularized Markov decision processes | May 22, 2017 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Equivalence Between Policy Gradients and Soft Q-Learning | Apr 21, 2017 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Stein Variational Policy Gradient | Apr 7, 2017 | Bayesian Inferencecontinuous-control | —Unverified | 0 |
| Batch Policy Gradient Methods for Improving Neural Conversation Models | Feb 10, 2017 | ChatbotPolicy Gradient Methods | —Unverified | 0 |
| A K-fold Method for Baseline Estimation in Policy Gradient Algorithms | Jan 3, 2017 | MuJoCoPolicy Gradient Methods | —Unverified | 0 |
| Sample-efficient Deep Reinforcement Learning for Dialog Control | Dec 18, 2016 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Self-critical Sequence Training for Image Captioning | Dec 2, 2016 | Image CaptioningPolicy Gradient Methods | CodeCode Available | 1 |
| Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic | Nov 7, 2016 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Dual Learning for Machine Translation | Nov 1, 2016 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Deep Reinforcement Learning for Dialogue Generation | Jun 5, 2016 | ChatbotDeep Reinforcement Learning | CodeCode Available | 0 |