SOTAVerified

Policy Gradient Methods

Papers

Showing 101150 of 382 papers

TitleStatusHype
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate0
When Do Off-Policy and On-Policy Policy Gradient Methods Align?0
Identifying Policy Gradient Subspaces0
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction0
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning0
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property0
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains0
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation0
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems0
Predictable Reinforcement Learning Dynamics through Entropy Rate MinimizationCode0
A Large Deviations Perspective on Policy Gradient Algorithms0
Clipped-Objective Policy Gradients for Pessimistic Policy OptimizationCode0
On the Second-Order Convergence of Biased Policy Gradient Algorithms0
Riemannian stochastic optimization methods avoid strict saddle points0
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning0
Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback0
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement LearningCode0
f-Policy Gradients: A General Framework for Goal Conditioned RL using f-Divergences0
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods0
Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control0
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods0
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds0
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient ApproachCode0
Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate ConvergenceCode0
Commodities Trading through Deep Policy Gradient Methods0
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement LearningCode0
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based ModelsCode0
Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior0
Provably Convergent Policy Optimization via Metric-aware Trust Region Methods0
Correcting discount-factor mismatch in on-policy policy gradient methods0
Acceleration in Policy Optimization0
Deep Policy Gradient Methods in Commodity Markets0
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes0
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation0
Solving Robust MDPs through No-Regret Dynamics0
Adaptive Policy Learning to Additional Tasks0
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models0
Client Selection for Federated Policy Optimization with Environment HeterogeneityCode0
Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters0
Policy Mirror Descent Inherently Explores Action Space0
Policy gradient learning methods for stochastic control with exit time and applications to share repurchase pricing0
A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee0
Distributional constrained reinforcement learning for supply chain optimizationCode0
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies0
Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning0
Policy Gradient for Rectangular Robust Markov Decision Processes0
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search0
Stochastic Dimension-reduced Second-order Methods for Policy Optimization0
On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures0
On the Convergence of Discounted Policy Gradient Methods0
Show:102550
← PrevPage 3 of 8Next →

No leaderboard results yet.