| Improving DAPO from a Mixed-Policy Perspective | Jul 17, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning | Jul 15, 2025 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Solving Zero-Sum Convex Markov Games | Jun 19, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Equivalence of stochastic and deterministic policy gradients | May 29, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Enhanced DACER Algorithm with High Diffusion Efficiency | May 29, 2025 | DenoisingImitation Learning | —Unverified | 0 |
| On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment | May 29, 2025 | Federated LearningPolicy Gradient Methods | —Unverified | 0 |
| Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs | May 21, 2025 | Combinatorial OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Policy Testing in Markov Decision Processes | May 21, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Self-Evolving Curriculum for LLM Reasoning | May 20, 2025 | Code GenerationPolicy Gradient Methods | —Unverified | 0 |
| KIPPO: Koopman-Inspired Proximal Policy Optimization | May 20, 2025 | Computational Efficiencycontinuous-control | —Unverified | 0 |
| Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models | May 5, 2025 | Policy Gradient MethodsRAG | CodeCode Available | 3 |
| Token-Efficient RL for LLM Reasoning | Apr 29, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Evolutionary Policy Optimization | Apr 17, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets | Apr 3, 2025 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Ordering-based Conditions for Global Convergence of Policy Gradient Methods | Apr 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch | Mar 28, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 |
| ROCM: RLHF on consistency models | Mar 8, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data | Feb 27, 2025 | Policy Gradient Methods | CodeCode Available | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |
| A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals | Feb 14, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Reevaluating Policy Gradient Methods for Imperfect-Information Games | Feb 13, 2025 | counterfactualDeep Reinforcement Learning | CodeCode Available | 1 |
| Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods | Feb 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Computing and Learning Stationary Mean Field Equilibria with Scalar Interactions: Algorithms and Applications | Feb 2, 2025 | counterfactualPolicy Gradient Methods | —Unverified | 0 |
| Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation | Feb 2, 2025 | Policy Gradient Methods | —Unverified | 0 |