SOTAVerified

Policy Gradient Methods

Papers

Showing 301350 of 382 papers

TitleStatusHype
Policy Tree Network0
Predicting Multiple Actions for Stochastic Continuous Control0
On the Second-Order Convergence of Biased Policy Gradient Algorithms0
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains0
Programmatic Reinforcement Learning without Oracles0
Provable Policy Gradient Methods for Average-Reward Markov Potential Games0
Provably Convergent Policy Optimization via Metric-aware Trust Region Methods0
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games0
Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information0
Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution0
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning0
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy0
Reinforcement Learning: An Overview0
Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design0
Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods0
Residual Policy Gradient: A Reward View of KL-regularized Objective0
Fast Efficient Hyperparameter Tuning for Policy Gradient MethodsCode0
Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate ConvergenceCode0
Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methodsCode0
Synthesis of Stabilizing Recurrent Equilibrium Network ControllersCode0
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based ModelsCode0
Deep Reinforcement Learning for Dialogue GenerationCode0
Sample Efficient Policy Gradient Methods with Recursive Variance ReductionCode0
Fast Efficient Hyperparameter Tuning for Policy GradientsCode0
Action-depedent Control Variates for Policy Optimization via Stein's IdentityCode0
Remember and Forget for Experience ReplayCode0
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous ControlCode0
Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient MethodsCode0
Shapley Q-value: A Local Reward Approach to Solve Global Reward GamesCode0
Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable ModelsCode0
The Mirage of Action-Dependent Baselines in Reinforcement LearningCode0
Matrix Low-Rank Approximation For Policy Gradient MethodsCode0
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient ApproachCode0
MDPGT: Momentum-based Decentralized Policy Gradient TrackingCode0
Predictable Reinforcement Learning Dynamics through Entropy Rate MinimizationCode0
A Nonparametric Off-Policy Policy GradientCode0
Clipped-Objective Policy Gradients for Pessimistic Policy OptimizationCode0
Model-free and Bayesian Ensembling Model-based Deep Reinforcement Learning for Particle Accelerator Control Demonstrated on the FERMI FELCode0
Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access LocationsCode0
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient LearningCode0
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement LearningCode0
Momentum-Based Policy Gradient MethodsCode0
Health-Informed Policy Gradients for Multi-Agent Reinforcement LearningCode0
Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive TargetsCode0
High-Dimensional Continuous Control Using Generalized Advantage EstimationCode0
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement LearningCode0
Hindsight policy gradientsCode0
Hindsight Trust Region Policy OptimizationCode0
Hindsight Value Function for Variance Reduction in Stochastic Dynamic EnvironmentCode0
A general class of surrogate functions for stable and efficient reinforcement learningCode0
Show:102550
← PrevPage 7 of 8Next →

No leaderboard results yet.