| Self-Evolving Curriculum for LLM Reasoning | May 20, 2025 | Code GenerationPolicy Gradient Methods | —Unverified | 0 | 0 |
| Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework | Dec 9, 2024 | Bilevel OptimizationPolicy Gradient Methods | —Unverified | 0 | 0 |
| Self-Supervised Continuous Control without Policy Gradient | Jan 1, 2021 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients | Apr 27, 2021 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models | May 19, 2023 | Efficient ExplorationLanguage Modeling | —Unverified | 0 | 0 |
| Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL) | Apr 12, 2019 | Decision MakingPolicy Gradient Methods | —Unverified | 0 | 0 |
| Softmax Policy Gradient Methods Can Take Exponential Time to Converge | Feb 22, 2021 | Policy Gradient Methods | —Unverified | 0 | 0 |
| SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search | Jan 30, 2023 | GPUPolicy Gradient Methods | —Unverified | 0 | 0 |
| SoftTreeMax: Policy Gradient with Tree Search | Sep 28, 2022 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Solving Robust MDPs through No-Regret Dynamics | May 30, 2023 | NavigatePolicy Gradient Methods | —Unverified | 0 | 0 |