SOTAVerified

Policy Gradient Methods

Papers

Showing 251300 of 382 papers

TitleStatusHype
Stochastic first-order methods for average-reward Markov decision processes0
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies0
Stochastic Recursive Momentum for Policy Gradient Methods0
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function0
Stochastic Variance Reduction for Policy Gradient Estimation0
Strategic bidding in freight transport using deep reinforcement learning0
Strongly-polynomial time and validation analysis of policy gradient methods0
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence0
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning0
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods0
The wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functions0
Token-Efficient RL for LLM Reasoning0
Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values0
Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis0
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework0
Towards Provable Log Density Policy Gradient0
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning0
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods0
Transfer Reward Learning for Policy Gradient-Based Text Generation0
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach0
Policy Gradient in Partially Observable Environments: Approximation and Convergence0
Understanding Early Word Learning in Situated Artificial Agents0
Understanding Grounded Language Learning Agents0
Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings0
Variance Reduced Domain Randomization for Policy Gradient0
Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization0
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines0
Variance Reduction for Reinforcement Learning in Input-Driven Environments0
Variance Reduction in Actor Critic Methods (ACM)0
When Do Off-Policy and On-Policy Policy Gradient Methods Align?0
Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies0
Zeroth-Order Supervised Policy Improvement0
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition0
Accelerated Reinforcement Learning0
Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning0
Action-dependent Control Variates for Policy Optimization via Stein Identity0
Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game0
Actor-Critic Reinforcement Learning with Phased Actor0
AdaFrame: Adaptive Frame Selection for Fast Video Recognition0
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation0
Adaptive Batch Size for Safe Policy Gradients0
Momentum-Based Policy Gradient with Second-Order Information0
Adaptive Policy Learning to Additional Tasks0
Adaptive Step-Size for Policy Gradient Methods0
Ad Headline Generation using Self-Critical Masked Language Model0
Adversarial Policy Gradient for Alternating Markov Games0
A Hybrid Approach Between Adversarial Generative Networks and Actor-Critic Policy Gradient for Low Rate High-Resolution Image Compression0
A K-fold Method for Baseline Estimation in Policy Gradient Algorithms0
A Large Deviations Perspective on Policy Gradient Algorithms0
All-Action Policy Gradient Methods: A Numerical Integration Approach0
Show:102550
← PrevPage 6 of 8Next →

No leaderboard results yet.