| Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models | May 5, 2025 | Policy Gradient MethodsRAG | CodeCode Available | 3 |
| Token-Efficient RL for LLM Reasoning | Apr 29, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Evolutionary Policy Optimization | Apr 17, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets | Apr 3, 2025 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Ordering-based Conditions for Global Convergence of Policy Gradient Methods | Apr 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch | Mar 28, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 |
| ROCM: RLHF on consistency models | Mar 8, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data | Feb 27, 2025 | Policy Gradient Methods | CodeCode Available | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |