SOTAVerified

Policy Gradient Methods

Papers

Showing 150 of 382 papers

TitleStatusHype
Improving DAPO from a Mixed-Policy Perspective0
Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning0
Solving Zero-Sum Convex Markov Games0
Enhanced DACER Algorithm with High Diffusion Efficiency0
Equivalence of stochastic and deterministic policy gradients0
On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment0
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs0
Policy Testing in Markov Decision Processes0
KIPPO: Koopman-Inspired Proximal Policy Optimization0
Self-Evolving Curriculum for LLM Reasoning0
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language ModelsCode3
Token-Efficient RL for LLM Reasoning0
Evolutionary Policy Optimization0
Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive TargetsCode0
Ordering-based Conditions for Global Convergence of Policy Gradient Methods0
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch0
Residual Policy Gradient: A Reward View of KL-regularized Objective0
ROCM: RLHF on consistency models0
Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic DataCode0
SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin0
A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals0
Reevaluating Policy Gradient Methods for Imperfect-Information GamesCode1
Fine-Tuning Discrete Diffusion Models with Policy Gradient MethodsCode1
Computing and Learning Stationary Mean Field Equilibria with Scalar Interactions: Algorithms and Applications0
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation0
Divergence-Augmented Policy OptimizationCode1
An Attentive Graph Agent for Topology-Adaptive Cyber DefenceCode1
Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement LearningCode0
Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework0
Reinforcement Learning: An OverviewCode0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Solving Rubik's Cube Without Tricky Sampling0
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay BuffersCode1
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning0
Policy Gradient for Robust Markov Decision ProcessesCode0
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach0
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMsCode1
Learning in complex action spaces without policy gradients0
Strongly-polynomial time and validation analysis of policy gradient methods0
Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action0
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph FormCode0
Reinforcement Learning for Causal Discovery without Acyclicity Constraints0
Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs0
From Imitation to Refinement -- Residual RL for Precise Assembly0
PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods0
Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values0
Augmented Bayesian Policy Search0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture0
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes0
Show:102550
← PrevPage 1 of 8Next →

No leaderboard results yet.