SOTAVerified

Policy Gradient Methods

Papers

Showing 51100 of 382 papers

TitleStatusHype
Entropy annealing for policy mirror descent in continuous time and space0
Mollification Effects of Policy Gradient Methods0
Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges0
Matrix Low-Rank Approximation For Policy Gradient MethodsCode0
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence0
Almost sure convergence rates of stochastic gradient methods under gradient domination0
An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning0
Federated Reinforcement Learning with Constraint Heterogeneity0
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline0
Information-Theoretic Opacity-Enforcement in Markov Decision Processes0
Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching0
Actor-Critic Reinforcement Learning with Phased Actor0
Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report0
Elementary Analysis of Policy Gradient Methods0
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but ImprovementCode1
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy0
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles0
Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries0
Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis0
Provable Policy Gradient Methods for Average-Reward Markov Potential Games0
Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control0
Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process0
Towards Provable Log Density Policy Gradient0
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate0
When Do Off-Policy and On-Policy Policy Gradient Methods Align?0
Identifying Policy Gradient Subspaces0
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction0
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning0
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property0
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains0
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation0
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems0
Predictable Reinforcement Learning Dynamics through Entropy Rate MinimizationCode0
A Large Deviations Perspective on Policy Gradient Algorithms0
Clipped-Objective Policy Gradients for Pessimistic Policy OptimizationCode0
On the Second-Order Convergence of Biased Policy Gradient Algorithms0
Riemannian stochastic optimization methods avoid strict saddle points0
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning0
Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback0
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement LearningCode0
f-Policy Gradients: A General Framework for Goal Conditioned RL using f-Divergences0
Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control0
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods0
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods0
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds0
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient ApproachCode0
Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate ConvergenceCode0
Commodities Trading through Deep Policy Gradient Methods0
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement LearningCode0
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based ModelsCode0
Show:102550
← PrevPage 2 of 8Next →

No leaderboard results yet.