SOTAVerified

Policy Gradient Methods

Papers

Showing 251300 of 382 papers

TitleStatusHype
Policy-Aware Model Learning for Policy Gradient MethodsCode0
GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction0
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement LearningCode0
Statistically Efficient Off-Policy Policy Gradients0
Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts0
Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks0
Deep Reinforcement Learning based Blind mmWave MIMO Beam Alignment0
A Nonparametric Off-Policy Policy GradientCode0
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods0
Fast Efficient Hyperparameter Tuning for Policy Gradient MethodsCode0
Optimal Resource Allocation in Wireless Control Systems via Deep Policy Gradient0
All-Action Policy Gradient Methods: A Numerical Integration Approach0
Policy Optimization for H_2 Linear Control with H_ Robustness Guarantee: Implicit Regularization and Global Convergence0
Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods0
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous ControlCode0
AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING0
Guided Adaptive Credit Assignment for Sample Efficient Policy Optimization0
Policy Tree Network0
DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning0
Sample Efficient Policy Gradient Methods with Recursive Variance ReductionCode0
Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access LocationsCode0
Transfer Reward Learning for Policy Gradient-Based Text Generation0
Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles0
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence0
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods0
Health-Informed Policy Gradients for Multi-Agent Reinforcement LearningCode0
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift0
Hindsight Trust Region Policy OptimizationCode0
Variance Reduction in Actor Critic Methods (ACM)0
Shapley Q-value: A Local Reward Approach to Solve Global Reward GamesCode0
Policy Optimization with Stochastic Mirror Descent0
Ranking Policy GradientCode0
Ekar: An Explainable Method for Knowledge Aware RecommendationCode2
Entropic Risk Measure in Policy Search0
Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies0
Is the Policy Gradient a Gradient?0
A Hybrid Approach Between Adversarial Generative Networks and Actor-Critic Policy Gradient for Low Rate High-Resolution Image Compression0
Global Optimality Guarantees For Policy Gradient Methods0
Neural Replicator DynamicsCode0
Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies0
Policy Search by Target Distribution Learning for Continuous Control0
Distributional Policy Optimization: An Alternative Approach for Continuous ControlCode1
Trajectory-Based Off-Policy Deep Reinforcement LearningCode0
Learning Novel Policies For Tasks0
Object Exchangeability in Reinforcement Learning: Extended Abstract0
Neural Logic Reinforcement LearningCode0
Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)0
Only Relevant Information Matters: Filtering Out Noisy Samples to Boost RL0
StartNet: Online Detection of Action Start in Untrimmed Videos0
Evaluating Rewards for Question Generation ModelsCode0
Show:102550
← PrevPage 6 of 8Next →

No leaderboard results yet.