| Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models | Feb 23, 2019 | Decision MakingDialogue Generation | CodeCode Available | 0 |
| Fast Efficient Hyperparameter Tuning for Policy Gradients | Feb 18, 2019 | Meta-LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Diverse Exploration via Conjugate Policies for Policy Gradient Methods | Feb 10, 2019 | Policy Gradient Methods | —Unverified | 0 |
| On-Policy Trust Region Policy Optimisation with Replay Buffers | Jan 18, 2019 | Continuous ControlDeep Reinforcement Learning | CodeCode Available | 0 |
| Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning | Dec 7, 2018 | Distributed ComputingMulti-agent Reinforcement Learning | —Unverified | 0 |
| AdaFrame: Adaptive Frame Selection for Fast Video Recognition | Nov 29, 2018 | Policy Gradient MethodsVideo Recognition | —Unverified | 0 |
| An Off-policy Policy Gradient Theorem Using Emphatic Weightings | Nov 22, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Reward-estimation variance elimination in sequential decision processes | Nov 15, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning | Nov 4, 2018 | DecoderMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement | Oct 22, 2018 | Policy Gradient MethodsQ-Learning | CodeCode Available | 0 |
| Risk-Sensitive Reinforcement Learning via Policy Gradient Search | Oct 22, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Policy Gradient in Partially Observable Environments: Approximation and Convergence | Oct 18, 2018 | Decision MakingPolicy Gradient Methods | —Unverified | 0 |
| Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods | Oct 5, 2018 | continuous-controlContinuous Control | CodeCode Available | 0 |
| CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization | Oct 1, 2018 | Abstractive Text SummarizationImage Captioning | —Unverified | 0 |
| Training for Diversity in Image Paragraph Captioning | Oct 1, 2018 | DiversityImage Captioning | CodeCode Available | 0 |
| Countering Language Drift via Grounding | Sep 27, 2018 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Assumption Questioning: Latent Copying and Reward Exploitation in Question Generation | Sep 27, 2018 | Inductive BiasMachine Translation | —Unverified | 0 |
| The wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functions | Sep 27, 2018 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Improvements on Hindsight Learning | Sep 16, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Image Captioning based on Deep Reinforcement Learning | Sep 13, 2018 | Deep Reinforcement LearningImage Captioning | —Unverified | 0 |
| Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration | Jul 30, 2018 | Deep Reinforcement LearningEfficient Exploration | —Unverified | 0 |
| Remember and Forget for Experience Replay | Jul 16, 2018 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Variance Reduction for Reinforcement Learning in Input-Driven Environments | Jul 6, 2018 | Meta-LearningMuJoCo | —Unverified | 0 |
| Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient | Jul 2, 2018 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Policy Optimization with Demonstrations | Jul 1, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Focused Hierarchical RNNs for Conditional Sequence Processing | Jun 12, 2018 | Open-Domain Question AnsweringPolicy Gradient Methods | —Unverified | 0 |
| Fingerprint Policy Optimisation for Robust Reinforcement Learning | May 27, 2018 | Bayesian OptimisationContinuous Control | —Unverified | 0 |
| Learning Self-Imitating Diverse Policies | May 25, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Multiagent Soft Q-Learning | Apr 25, 2018 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| On Learning Intrinsic Rewards for Policy Gradient Methods | Apr 17, 2018 | Atari GamesDecision Making | CodeCode Available | 0 |
| Information Maximizing Exploration with a Latent Dynamics Model | Apr 4, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines | Mar 20, 2018 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| The Mirage of Action-Dependent Baselines in Reinforcement Learning | Feb 27, 2018 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Optimizing over a Restricted Policy Class in Markov Decision Processes | Feb 26, 2018 | Policy Gradient Methods | —Unverified | 0 |
| Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning | Feb 22, 2018 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Clipped Action Policy Gradient | Feb 21, 2018 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Policy Gradients for Contextual Recommendations | Feb 12, 2018 | Decision MakingMulti-Armed Bandits | —Unverified | 0 |
| Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator | Jan 15, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Expected Policy Gradients for Reinforcement Learning | Jan 10, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Global Convergence of Policy Gradient Methods for Linearized Control Problems | Jan 1, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Predicting Multiple Actions for Stochastic Continuous Control | Jan 1, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| Adversarial Policy Gradient for Alternating Markov Games | Jan 1, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Action-dependent Control Variates for Policy Optimization via Stein Identity | Jan 1, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Understanding Grounded Language Learning Agents | Jan 1, 2018 | Grounded language learningPolicy Gradient Methods | —Unverified | 0 |
| Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents | Dec 18, 2017 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Bayesian Policy Gradients via Alpha Divergence Dropout Inference | Dec 6, 2017 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Adaptive Batch Size for Safe Policy Gradients | Dec 1, 2017 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Divide-and-Conquer Reinforcement Learning | Nov 27, 2017 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Run, skeleton, run: skeletal model in a physics-based simulation | Nov 18, 2017 | NavigatePolicy Gradient Methods | CodeCode Available | 0 |
| Hindsight policy gradients | Nov 16, 2017 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |