| A reinterpretation of the policy oscillation phenomenon in approximate policy iteration | Dec 1, 2011 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Entropy annealing for policy mirror descent in continuous time and space | May 30, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods | Dec 11, 2019 | Policy Gradient Methods | —Unverified | 0 |
| Adversarial Policy Gradient for Alternating Markov Games | Jan 1, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Equivalence Between Policy Gradients and Soft Q-Learning | Apr 21, 2017 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Equivalence of stochastic and deterministic policy gradients | May 29, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes | Jun 6, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization | Oct 19, 2021 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Evolutionary Policy Optimization | Apr 17, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Evolutionary Selective Imitation: Interpretable Agents by Imitation Learning Without a Demonstrator | Sep 17, 2020 | Imitation LearningOpenAI Gym | —Unverified | 0 |
| Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm | Dec 28, 2022 | ChatbotDeep Reinforcement Learning | —Unverified | 0 |
| Improvements on Hindsight Learning | Sep 16, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Expected Policy Gradients for Reinforcement Learning | Jan 10, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Improving DAPO from a Mixed-Policy Perspective | Jul 17, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Identifying Policy Gradient Subspaces | Jan 12, 2024 | continuous-controlContinuous Control | —Unverified | 0 |
| Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game | Sep 29, 2021 | counterfactualDeep Reinforcement Learning | —Unverified | 0 |
| Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs | Feb 20, 2021 | Policy Gradient Methods | —Unverified | 0 |
| Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization | Jul 13, 2020 | Policy Gradient Methods | —Unverified | 0 |
| Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning | Nov 1, 2023 | Decision MakingPolicy Gradient Methods | —Unverified | 0 |
| Federated Reinforcement Learning with Constraint Heterogeneity | May 6, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Momentum-Based Policy Gradient with Second-Order Information | May 17, 2022 | Policy Gradient Methods | —Unverified | 0 |
| Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control | Mar 7, 2024 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Image Captioning based on Deep Reinforcement Learning | Sep 13, 2018 | Deep Reinforcement LearningImage Captioning | —Unverified | 0 |
| Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions | Jun 16, 2024 | Multi-Armed BanditsPolicy Gradient Methods | —Unverified | 0 |
| Fingerprint Policy Optimisation for Robust Reinforcement Learning | May 27, 2018 | Bayesian OptimisationContinuous Control | —Unverified | 0 |
| Focused Hierarchical RNNs for Conditional Sequence Processing | Jun 12, 2018 | Open-Domain Question AnsweringPolicy Gradient Methods | —Unverified | 0 |
| Improving Sample Efficiency and Multi-Agent Communication in RL-based Train Rescheduling | Apr 28, 2020 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games | Jun 15, 2022 | Policy Gradient Methods | —Unverified | 0 |
| Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings | Oct 30, 2021 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning | May 10, 2024 | MisconceptionsMulti-agent Reinforcement Learning | —Unverified | 0 |
| On Linear Convergence of Policy Gradient Methods for Finite MDPs | Jul 21, 2020 | Policy Gradient Methods | —Unverified | 0 |
| Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control | Oct 8, 2023 | Decision MakingPolicy Gradient Methods | —Unverified | 0 |
| Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies | Jun 19, 2019 | Autonomous DrivingPolicy Gradient Methods | —Unverified | 0 |
| Controlling an Inverted Pendulum with Policy Gradient Methods-A Tutorial | May 17, 2021 | OpenAI GymPolicy Gradient Methods | —Unverified | 0 |
| Adaptive Step-Size for Policy Gradient Methods | Dec 1, 2013 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies | May 31, 2019 | DiversityPolicy Gradient Methods | —Unverified | 0 |
| Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control | Nov 30, 2021 | Policy Gradient Methods | —Unverified | 0 |
| Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching | Apr 27, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Global Optimality Guarantees For Policy Gradient Methods | Jun 5, 2019 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles | Mar 18, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences | Jul 17, 2021 | Policy Gradient Methods | —Unverified | 0 |
| Guided Adaptive Credit Assignment for Sample Efficient Policy Optimization | Sep 25, 2019 | Instruction FollowingPolicy Gradient Methods | —Unverified | 0 |
| A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee | Feb 11, 2023 | Policy Gradient Methods | —Unverified | 0 |
| Ad Headline Generation using Self-Critical Masked Language Model | Jun 1, 2021 | Headline GenerationLanguage Modeling | —Unverified | 0 |
| Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems | Nov 1, 2022 | Policy Gradient Methods | —Unverified | 0 |
| How are policy gradient methods affected by the limits of control? | Jun 14, 2022 | Policy Gradient Methods | —Unverified | 0 |
| Correcting discount-factor mismatch in on-policy policy gradient methods | Jun 23, 2023 | OpenAI GymPolicy Gradient Methods | —Unverified | 0 |
| Approximation Benefits of Policy Gradient Methods with Aggregated States | Jul 22, 2020 | Policy Gradient Methods | —Unverified | 0 |
| Countering Language Drift via Grounding | Sep 27, 2018 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Global Convergence of Policy Gradient Methods for Linearized Control Problems | Jan 1, 2018 | continuous-controlContinuous Control | —Unverified | 0 |