| Policy Tree Network | Sep 25, 2019 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 |
| Predicting Multiple Actions for Stochastic Continuous Control | Jan 1, 2018 | continuous-controlContinuous Control | —Unverified | 0 |
| On the Second-Order Convergence of Biased Policy Gradient Algorithms | Nov 5, 2023 | Policy Gradient Methods | —Unverified | 0 |
| Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains | Dec 9, 2023 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Programmatic Reinforcement Learning without Oracles | Sep 29, 2021 | Bilevel OptimizationDeep Reinforcement Learning | —Unverified | 0 |
| Provable Policy Gradient Methods for Average-Reward Markov Potential Games | Mar 9, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Provably Convergent Policy Optimization via Metric-aware Trust Region Methods | Jun 25, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games | Feb 17, 2021 | Policy Gradient MethodsVocal Bursts Valence Prediction | —Unverified | 0 |
| Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information | Jul 20, 2021 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution | Nov 3, 2021 | continuous-controlContinuous Control | —Unverified | 0 |
| Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | Nov 7, 2024 | Offline RLPolicy Gradient Methods | —Unverified | 0 |
| ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy | Mar 21, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Reinforcement Learning: An Overview | Dec 6, 2024 | Decision MakingDeep Reinforcement Learning | —Unverified | 0 |
| Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design | Dec 21, 2021 | Deep Reinforcement LearningExperimental Design | —Unverified | 0 |
| Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods | Nov 29, 2020 | Policy Gradient Methods | —Unverified | 0 |
| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 |
| Fast Efficient Hyperparameter Tuning for Policy Gradient Methods | Dec 1, 2019 | Policy Gradient Methods | CodeCode Available | 0 |
| Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence | Sep 8, 2023 | Multi-agent Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods | Jan 28, 2022 | Knowledge GraphsPolicy Gradient Methods | CodeCode Available | 0 |
| Synthesis of Stabilizing Recurrent Equilibrium Network Controllers | Mar 31, 2022 | Policy Gradient Methods | CodeCode Available | 0 |
| Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models | Jul 16, 2023 | Policy Gradient Methods | CodeCode Available | 0 |
| Deep Reinforcement Learning for Dialogue Generation | Jun 5, 2016 | ChatbotDeep Reinforcement Learning | CodeCode Available | 0 |
| Sample Efficient Policy Gradient Methods with Recursive Variance Reduction | Sep 18, 2019 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Fast Efficient Hyperparameter Tuning for Policy Gradients | Feb 18, 2019 | Meta-LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Action-depedent Control Variates for Policy Optimization via Stein's Identity | Oct 30, 2017 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Remember and Forget for Experience Replay | Jul 16, 2018 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control | Aug 10, 2017 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods | Oct 5, 2018 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Shapley Q-value: A Local Reward Approach to Solve Global Reward Games | Jul 11, 2019 | Multi-agent Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models | Feb 23, 2019 | Decision MakingDialogue Generation | CodeCode Available | 0 |
| The Mirage of Action-Dependent Baselines in Reinforcement Learning | Feb 27, 2018 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Matrix Low-Rank Approximation For Policy Gradient Methods | May 27, 2024 | Matrix CompletionPolicy Gradient Methods | CodeCode Available | 0 |
| Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach | Sep 19, 2023 | Policy Gradient Methods | CodeCode Available | 0 |
| MDPGT: Momentum-based Decentralized Policy Gradient Tracking | Dec 6, 2021 | Multi-agent Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization | Nov 30, 2023 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| A Nonparametric Off-Policy Policy Gradient | Jan 8, 2020 | Density EstimationPolicy Gradient Methods | CodeCode Available | 0 |
| Clipped-Objective Policy Gradients for Pessimistic Policy Optimization | Nov 10, 2023 | Deep Reinforcement LearningMulti-Task Learning | CodeCode Available | 0 |
| Model-free and Bayesian Ensembling Model-based Deep Reinforcement Learning for Particle Accelerator Control Demonstrated on the FERMI FEL | Dec 17, 2020 | Deep Reinforcement Learningmodel | CodeCode Available | 0 |
| Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations | Sep 10, 2019 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning | Jul 16, 2020 | Policy Gradient MethodsQ-Learning | CodeCode Available | 0 |
| Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning | Oct 18, 2023 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Momentum-Based Policy Gradient Methods | Jul 13, 2020 | Policy Gradient Methods | CodeCode Available | 0 |
| Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning | Aug 2, 2019 | Multi-agent Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets | Apr 3, 2025 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| High-Dimensional Continuous Control Using Generalized Advantage Estimation | Jun 8, 2015 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning | Jul 21, 2023 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 0 |
| Hindsight policy gradients | Nov 16, 2017 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Hindsight Trust Region Policy Optimization | Jul 29, 2019 | Atari GamesPolicy Gradient Methods | CodeCode Available | 0 |
| Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment | Jul 26, 2021 | Deep Reinforcement LearningPolicy Gradient Methods | CodeCode Available | 0 |
| A general class of surrogate functions for stable and efficient reinforcement learning | Aug 12, 2021 | MuJoCoPolicy Gradient Methods | CodeCode Available | 0 |