| On the Second-Order Convergence of Biased Policy Gradient Algorithms | Nov 5, 2023 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains | Dec 9, 2023 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Programmatic Reinforcement Learning without Oracles | Sep 29, 2021 | Bilevel OptimizationDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Provable Policy Gradient Methods for Average-Reward Markov Potential Games | Mar 9, 2024 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Provably Convergent Policy Optimization via Metric-aware Trust Region Methods | Jun 25, 2023 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games | Feb 17, 2021 | Policy Gradient MethodsVocal Bursts Valence Prediction | —Unverified | 0 | 0 |
| Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information | Jul 20, 2021 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution | Nov 3, 2021 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | Nov 7, 2024 | Offline RLPolicy Gradient Methods | —Unverified | 0 | 0 |
| ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy | Mar 21, 2024 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design | Dec 21, 2021 | Deep Reinforcement LearningExperimental Design | —Unverified | 0 | 0 |
| Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods | Nov 29, 2020 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 | 0 |
| Rethinking Deep Policy Gradients via State-Wise Policy Improvement | Oct 19, 2020 | Policy Gradient MethodsValue prediction | —Unverified | 0 | 0 |
| Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate | Mar 1, 2024 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Reward-estimation variance elimination in sequential decision processes | Nov 15, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 | 0 |
| Riemannian stochastic optimization methods avoid strict saddle points | Nov 4, 2023 | Dictionary LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Risk-Sensitive Reinforcement Learning via Policy Gradient Search | Oct 22, 2018 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation | Dec 8, 2023 | 3D GenerationDenoising | —Unverified | 0 | 0 |
| ROCM: RLHF on consistency models | Mar 8, 2025 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality? | Apr 2, 2020 | Policy Gradient MethodsQ-Learning | —Unverified | 0 | 0 |
| Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds | Sep 25, 2023 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Sample Complexity of Policy Gradient Finding Second-Order Stationary Points | Dec 2, 2020 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Sample-efficient actor-critic algorithms with an etiquette for zero-sum Markov games | Sep 29, 2021 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Sample-efficient Deep Reinforcement Learning for Dialog Control | Dec 18, 2016 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Sample Efficient Reinforcement Learning with REINFORCE | Oct 22, 2020 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| Only Relevant Information Matters: Filtering Out Noisy Samples to Boost RL | Apr 8, 2019 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems | Dec 5, 2023 | FormModel-based Reinforcement Learning | —Unverified | 0 | 0 |
| Self-Evolving Curriculum for LLM Reasoning | May 20, 2025 | Code GenerationPolicy Gradient Methods | —Unverified | 0 | 0 |
| Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework | Dec 9, 2024 | Bilevel OptimizationPolicy Gradient Methods | —Unverified | 0 | 0 |
| Self-Supervised Continuous Control without Policy Gradient | Jan 1, 2021 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients | Apr 27, 2021 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models | May 19, 2023 | Efficient ExplorationLanguage Modeling | —Unverified | 0 | 0 |
| Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL) | Apr 12, 2019 | Decision MakingPolicy Gradient Methods | —Unverified | 0 | 0 |
| Softmax Policy Gradient Methods Can Take Exponential Time to Converge | Feb 22, 2021 | Policy Gradient Methods | —Unverified | 0 | 0 |
| SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search | Jan 30, 2023 | GPUPolicy Gradient Methods | —Unverified | 0 | 0 |
| SoftTreeMax: Policy Gradient with Tree Search | Sep 28, 2022 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Solving Robust MDPs through No-Regret Dynamics | May 30, 2023 | NavigatePolicy Gradient Methods | —Unverified | 0 | 0 |
| Solving Rubik's Cube Without Tricky Sampling | Nov 29, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Solving Zero-Sum Convex Markov Games | Jun 19, 2025 | Policy Gradient Methods | —Unverified | 0 | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 | 0 |
| Stabilizing Dynamical Systems via Policy Gradient Methods | Oct 13, 2021 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process | Mar 7, 2024 | Drug DesignPolicy Gradient Methods | —Unverified | 0 | 0 |
| StartNet: Online Detection of Action Start in Untrimmed Videos | Mar 23, 2019 | Action ClassificationPolicy Gradient Methods | —Unverified | 0 | 0 |
| Statistically Efficient Off-Policy Policy Gradients | Feb 10, 2020 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 | 0 |
| Stein Variational Policy Gradient | Apr 7, 2017 | Bayesian Inferencecontinuous-control | —Unverified | 0 | 0 |
| Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes | Jun 13, 2023 | Meta Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Stochastic Dimension-reduced Second-order Methods for Policy Optimization | Jan 28, 2023 | Policy Gradient MethodsSecond-order methods | —Unverified | 0 | 0 |
| Stochastic first-order methods for average-reward Markov decision processes | May 11, 2022 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies | Feb 3, 2023 | Policy Gradient Methods | —Unverified | 0 | 0 |