SOTAVerified

Policy Gradient Methods

Papers

Showing 151200 of 382 papers

TitleStatusHype
On the Convergence of Discounted Policy Gradient Methods0
Policy Gradient in Robust MDPs with Global Convergence GuaranteeCode0
An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods0
Geometry and convergence of natural policy gradient methods0
Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems0
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence0
Policy Gradient Methods for Designing Dynamic Output Feedback Controllers0
On the convergence of policy gradient methods to Nash equilibria in general stochastic games0
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies0
SoftTreeMax: Policy Gradient with Tree Search0
Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning0
On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator0
The Performance Impact of Combining Agent Factorization with Different Learning Algorithms for Multiagent CoordinationCode0
Natural Policy Gradients In Reinforcement Learning Explained0
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework0
Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games0
How are policy gradient methods affected by the limits of control?0
Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization0
Learning Dynamics and Generalization in Reinforcement Learning0
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function0
Momentum-Based Policy Gradient with Second-Order Information0
Stochastic first-order methods for average-reward Markov decision processes0
Learning to Constrain Policy Optimization with Virtual Trust Region0
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization0
Synthesis of Stabilizing Recurrent Equilibrium Network ControllersCode0
Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach0
Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment0
Linear convergence of a policy gradient method for some finite horizon continuous time control problems0
Policy Learning and Evaluation with Randomized Quasi-Monte Carlo0
Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence0
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation0
Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methodsCode0
Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity0
Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement LearningCode0
On the Convergence Rates of Policy Gradient Methods0
Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design0
MDPGT: Momentum-based Decentralized Policy Gradient TrackingCode0
Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control0
Time Discretization-Invariant Safe Action Repetition for Policy Gradient MethodsCode0
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution MismatchCode0
Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution0
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings0
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization0
Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning0
Stabilizing Dynamical Systems via Policy Gradient Methods0
Programmatic Reinforcement Learning without Oracles0
Variance Reduced Domain Randomization for Policy Gradient0
Efficient Wasserstein and Sinkhorn Policy Optimization0
Sample-efficient actor-critic algorithms with an etiquette for zero-sum Markov games0
Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game0
Show:102550
← PrevPage 4 of 8Next →

No leaderboard results yet.