| Experimental design for MRI by greedy policy search | Oct 30, 2020 | Experimental DesignPolicy Gradient Methods | CodeCode Available | 1 |
| Efficient Wasserstein Natural Gradients for Reinforcement Learning | Oct 12, 2020 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 1 |
| Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting | Jul 14, 2020 | Lifelong learningPolicy Gradient Methods | CodeCode Available | 1 |
| Deep Bayesian Quadrature Policy Optimization | Jun 28, 2020 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Competitive Policy Optimization | Jun 18, 2020 | Policy Gradient Methods | CodeCode Available | 1 |
| Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning | Jun 1, 2020 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 1 |
| Distributional Policy Optimization: An Alternative Approach for Continuous Control | May 23, 2019 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning | Nov 4, 2018 | DecoderMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| Self-critical Sequence Training for Image Captioning | Dec 2, 2016 | Image CaptioningPolicy Gradient Methods | CodeCode Available | 1 |
| Trust Region Policy Optimization | Feb 19, 2015 | Atari GamesPolicy Gradient Methods | CodeCode Available | 1 |
| Improving DAPO from a Mixed-Policy Perspective | Jul 17, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning | Jul 15, 2025 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Solving Zero-Sum Convex Markov Games | Jun 19, 2025 | Policy Gradient Methods | —Unverified | 0 |
| On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment | May 29, 2025 | Federated LearningPolicy Gradient Methods | —Unverified | 0 |
| Equivalence of stochastic and deterministic policy gradients | May 29, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Enhanced DACER Algorithm with High Diffusion Efficiency | May 29, 2025 | DenoisingImitation Learning | —Unverified | 0 |
| Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs | May 21, 2025 | Combinatorial OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Policy Testing in Markov Decision Processes | May 21, 2025 | Policy Gradient Methods | —Unverified | 0 |
| KIPPO: Koopman-Inspired Proximal Policy Optimization | May 20, 2025 | Computational Efficiencycontinuous-control | —Unverified | 0 |
| Self-Evolving Curriculum for LLM Reasoning | May 20, 2025 | Code GenerationPolicy Gradient Methods | —Unverified | 0 |
| Token-Efficient RL for LLM Reasoning | Apr 29, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Evolutionary Policy Optimization | Apr 17, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets | Apr 3, 2025 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Ordering-based Conditions for Global Convergence of Policy Gradient Methods | Apr 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch | Mar 28, 2025 | Policy Gradient Methods | —Unverified | 0 |