SOTAVerified

Policy Gradient Methods

Papers

Showing 51100 of 382 papers

TitleStatusHype
Bayesian Policy Gradients via Alpha Divergence Dropout InferenceCode0
Remember and Forget for Experience ReplayCode0
Shapley Q-value: A Local Reward Approach to Solve Global Reward GamesCode0
Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable ModelsCode0
Trajectory-Based Off-Policy Deep Reinforcement LearningCode0
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient ApproachCode0
Divide-and-Conquer Reinforcement LearningCode0
On-Policy Trust Region Policy Optimisation with Replay BuffersCode0
Clipped Action Policy GradientCode0
Clipped-Objective Policy Gradients for Pessimistic Policy OptimizationCode0
Cold-Start Reinforcement Learning with Softmax Policy GradientCode0
The Mirage of Action-Dependent Baselines in Reinforcement LearningCode0
On Learning Intrinsic Rewards for Policy Gradient MethodsCode0
Distributional constrained reinforcement learning for supply chain optimizationCode0
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy ImprovementCode0
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient LearningCode0
Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement LearningCode0
Deep Reinforcement Learning for Dialogue GenerationCode0
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph FormCode0
MDPGT: Momentum-based Decentralized Policy Gradient TrackingCode0
A general class of surrogate functions for stable and efficient reinforcement learningCode0
Model-free and Bayesian Ensembling Model-based Deep Reinforcement Learning for Particle Accelerator Control Demonstrated on the FERMI FELCode0
Policy-Aware Model Learning for Policy Gradient MethodsCode0
Learning Goal-Oriented Visual Dialog via Tempered Policy GradientCode0
Jointly Learning Environments and Control Policies with Projected Stochastic Gradient AscentCode0
Hindsight Trust Region Policy OptimizationCode0
Hindsight policy gradientsCode0
Hindsight Value Function for Variance Reduction in Stochastic Dynamic EnvironmentCode0
High-Dimensional Continuous Control Using Generalized Advantage EstimationCode0
Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic DataCode0
Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access LocationsCode0
Matrix Low-Rank Approximation For Policy Gradient MethodsCode0
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement LearningCode0
Neural Replicator DynamicsCode0
Momentum-Based Policy Gradient MethodsCode0
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking AgentsCode0
Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate ConvergenceCode0
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based ModelsCode0
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution MismatchCode0
Neural Logic Reinforcement LearningCode0
Fast Efficient Hyperparameter Tuning for Policy GradientsCode0
Dual Learning for Machine TranslationCode0
Policy Gradient in Robust MDPs with Global Convergence GuaranteeCode0
Fast Efficient Hyperparameter Tuning for Policy Gradient MethodsCode0
Health-Informed Policy Gradients for Multi-Agent Reinforcement LearningCode0
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement LearningCode0
Evaluating Rewards for Question Generation ModelsCode0
A Nonparametric Off-Policy Policy GradientCode0
Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive TargetsCode0
Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methodsCode0
Show:102550
← PrevPage 2 of 8Next →

No leaderboard results yet.