| Improving DAPO from a Mixed-Policy Perspective | Jul 17, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning | Jul 15, 2025 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Solving Zero-Sum Convex Markov Games | Jun 19, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Enhanced DACER Algorithm with High Diffusion Efficiency | May 29, 2025 | DenoisingImitation Learning | —Unverified | 0 |
| Equivalence of stochastic and deterministic policy gradients | May 29, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment | May 29, 2025 | Federated LearningPolicy Gradient Methods | —Unverified | 0 |
| Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs | May 21, 2025 | Combinatorial OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Policy Testing in Markov Decision Processes | May 21, 2025 | Policy Gradient Methods | —Unverified | 0 |
| KIPPO: Koopman-Inspired Proximal Policy Optimization | May 20, 2025 | Computational Efficiencycontinuous-control | —Unverified | 0 |
| Self-Evolving Curriculum for LLM Reasoning | May 20, 2025 | Code GenerationPolicy Gradient Methods | —Unverified | 0 |
| Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models | May 5, 2025 | Policy Gradient MethodsRAG | CodeCode Available | 3 |
| Token-Efficient RL for LLM Reasoning | Apr 29, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Evolutionary Policy Optimization | Apr 17, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets | Apr 3, 2025 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 0 |
| Ordering-based Conditions for Global Convergence of Policy Gradient Methods | Apr 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch | Mar 28, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Residual Policy Gradient: A Reward View of KL-regularized Objective | Mar 14, 2025 | Imitation LearningMuJoCo | —Unverified | 0 |
| ROCM: RLHF on consistency models | Mar 8, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data | Feb 27, 2025 | Policy Gradient Methods | CodeCode Available | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |
| A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals | Feb 14, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Reevaluating Policy Gradient Methods for Imperfect-Information Games | Feb 13, 2025 | counterfactualDeep Reinforcement Learning | CodeCode Available | 1 |
| Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods | Feb 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Computing and Learning Stationary Mean Field Equilibria with Scalar Interactions: Algorithms and Applications | Feb 2, 2025 | counterfactualPolicy Gradient Methods | —Unverified | 0 |
| Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation | Feb 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Divergence-Augmented Policy Optimization | Jan 25, 2025 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 1 |
| An Attentive Graph Agent for Topology-Adaptive Cyber Defence | Jan 24, 2025 | Graph AttentionGraph Neural Network | CodeCode Available | 1 |
| Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning | Jan 8, 2025 | Policy Gradient MethodsReinforcement Learning (RL) | CodeCode Available | 0 |
| Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework | Dec 9, 2024 | Bilevel OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Reinforcement Learning: An Overview | Dec 6, 2024 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 0 |
| BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings | Nov 30, 2024 | Bayesian OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Solving Rubik's Cube Without Tricky Sampling | Nov 29, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers | Nov 22, 2024 | AvgDeep Reinforcement Learning | CodeCode Available | 1 |
| Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | Nov 7, 2024 | Offline RLPolicy Gradient Methods | —Unverified | 0 |
| Policy Gradient for Robust Markov Decision Processes | Oct 29, 2024 | Policy Gradient Methods | CodeCode Available | 0 |
| Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach | Oct 17, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs | Oct 10, 2024 | Information RetrievalPolicy Gradient Methods | CodeCode Available | 1 |
| Learning in complex action spaces without policy gradients | Oct 8, 2024 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Strongly-polynomial time and validation analysis of policy gradient methods | Sep 28, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action | Sep 25, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form | Aug 29, 2024 | FormPolicy Gradient Methods | CodeCode Available | 0 |
| Reinforcement Learning for Causal Discovery without Acyclicity Constraints | Aug 24, 2024 | Causal DiscoveryEfficient Exploration | —Unverified | 0 |
| Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs | Aug 19, 2024 | continuous-controlContinuous Control | —Unverified | 0 |
| From Imitation to Refinement -- Residual RL for Precise Assembly | Jul 23, 2024 | ChunkingPolicy Gradient Methods | —Unverified | 0 |
| PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods | Jul 18, 2024 | Atari GamesDecision Making | —Unverified | 0 |
| Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values | Jul 14, 2024 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Augmented Bayesian Policy Search | Jul 5, 2024 | Bayesian OptimizationLEMMA | —Unverified | 0 |
| Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions | Jun 16, 2024 | Multi-Armed BanditsPolicy Gradient Methods | —Unverified | 0 |
| Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture | Jun 13, 2024 | Decision MakingManagement | —Unverified | 0 |
| Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes | Jun 6, 2024 | Policy Gradient Methods | —Unverified | 0 |