| Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods | Nov 6, 2021 | MuJoCoPolicy Gradient Methods | CodeCode Available | 0 |
| Run, skeleton, run: skeletal model in a physics-based simulation | Nov 18, 2017 | NavigatePolicy Gradient Methods | CodeCode Available | 0 |
| Client Selection for Federated Policy Optimization with Environment Heterogeneity | May 18, 2023 | MuJoCoPolicy Gradient Methods | CodeCode Available | 0 |
| Training for Diversity in Image Paragraph Captioning | Oct 1, 2018 | DiversityImage Captioning | CodeCode Available | 0 |
| Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement | Oct 22, 2018 | Policy Gradient MethodsQ-Learning | CodeCode Available | 0 |
| Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic | Nov 7, 2016 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Evaluating Rewards for Question Generation Models | Feb 28, 2019 | Machine TranslationPolicy Gradient Methods | CodeCode Available | 0 |
| Dual Learning for Machine Translation | Nov 1, 2016 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| On Learning Intrinsic Rewards for Policy Gradient Methods | Apr 17, 2018 | Atari GamesDecision Making | CodeCode Available | 0 |
| Cold-Start Reinforcement Learning with Softmax Policy Gradient | Sep 27, 2017 | Image CaptioningPolicy Gradient Methods | CodeCode Available | 0 |