| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 |
| ROCM: RLHF on consistency models | Mar 8, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data | Feb 27, 2025 | Policy Gradient Methods | CodeCode Available | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |
| A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals | Feb 14, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation | Feb 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Computing and Learning Stationary Mean Field Equilibria with Scalar Interactions: Algorithms and Applications | Feb 2, 2025 | counterfactualPolicy Gradient Methods | —Unverified | 0 |
| Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning | Jan 8, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | CodeCode Available | 0 |
| Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework | Dec 9, 2024 | Bilevel OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Reinforcement Learning: An Overview | Dec 6, 2024 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 0 |
| BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings | Nov 30, 2024 | Bayesian OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Solving Rubik's Cube Without Tricky Sampling | Nov 29, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | Nov 7, 2024 | Offline RLPolicy Gradient Methods | —Unverified | 0 |
| Policy Gradient for Robust Markov Decision Processes | Oct 29, 2024 | Policy Gradient Methods | CodeCode Available | 0 |
| Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach | Oct 17, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Learning in complex action spaces without policy gradients | Oct 8, 2024 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Strongly-polynomial time and validation analysis of policy gradient methods | Sep 28, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action | Sep 25, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form | Aug 29, 2024 | FormPolicy Gradient Methods | CodeCode Available | 0 |
| Reinforcement Learning for Causal Discovery without Acyclicity Constraints | Aug 24, 2024 | Causal DiscoveryEfficient Exploration | —Unverified | 0 |
| Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs | Aug 19, 2024 | continuous-controlContinuous Control | —Unverified | 0 |
| From Imitation to Refinement -- Residual RL for Precise Assembly | Jul 23, 2024 | ChunkingPolicy Gradient Methods | —Unverified | 0 |
| PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods | Jul 18, 2024 | Atari GamesDecision Making | —Unverified | 0 |
| Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values | Jul 14, 2024 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Augmented Bayesian Policy Search | Jul 5, 2024 | Bayesian OptimizationLEMMA | —Unverified | 0 |