| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 |
| ROCM: RLHF on consistency models | Mar 8, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data | Feb 27, 2025 | Policy Gradient Methods | CodeCode Available | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |
| A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals | Feb 14, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Computing and Learning Stationary Mean Field Equilibria with Scalar Interactions: Algorithms and Applications | Feb 2, 2025 | counterfactualPolicy Gradient Methods | —Unverified | 0 |
| Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation | Feb 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning | Jan 8, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | CodeCode Available | 0 |
| Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework | Dec 9, 2024 | Bilevel OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Reinforcement Learning: An Overview | Dec 6, 2024 | Decision MakingDeep Reinforcement Learning | —Unverified | 0 |