SOTAVerified

Policy Gradient Methods

Papers

Showing 150 of 382 papers

TitleStatusHype
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language ModelsCode3
Ekar: An Explainable Method for Knowledge Aware RecommendationCode2
Proximal Policy Optimization AlgorithmsCode2
Efficient Diffusion Policies for Offline Reinforcement LearningCode1
Model-free Policy Learning with Reward GradientsCode1
Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without ForgettingCode1
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy MeasureCode1
Fine-Tuning Discrete Diffusion Models with Policy Gradient MethodsCode1
Online Portfolio Management via Deep Reinforcement Learning with High-Frequency DataCode1
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but ImprovementCode1
Reevaluating Policy Gradient Methods for Imperfect-Information GamesCode1
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement LearningCode1
An Attentive Graph Agent for Topology-Adaptive Cyber DefenceCode1
Experimental design for MRI by greedy policy searchCode1
Deep Bayesian Quadrature Policy OptimizationCode1
Bayesian Action Decoder for Deep Multi-Agent Reinforcement LearningCode1
Efficient Wasserstein Natural Gradients for Reinforcement LearningCode1
Episodic Policy Gradient TrainingCode1
Learning Opinion Summarizers by Selecting Informative ReviewsCode1
Neural Inventory Control in Networks via Hindsight Differentiable Policy OptimizationCode1
Partial advantage estimator for proximal policy optimizationCode1
Self-critical Sequence Training for Image CaptioningCode1
Trust Region Policy OptimizationCode1
Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent DesignCode1
Continuous MDP Homomorphisms and Homomorphic Policy GradientCode1
Policy Gradient Methods in the Presence of Symmetries and State AbstractionsCode1
Distributional Policy Optimization: An Alternative Approach for Continuous ControlCode1
An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy SearchCode1
Learning Multi-Agent Communication through Structured Attentive ReasoningCode1
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy OptimizationCode1
Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement LearningCode1
Competitive Policy OptimizationCode1
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMsCode1
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay BuffersCode1
Divergence-Augmented Policy OptimizationCode1
An Off-policy Policy Gradient Theorem Using Emphatic Weightings0
An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods0
Momentum-Based Policy Gradient with Second-Order Information0
Adaptive Batch Size for Safe Policy Gradients0
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition0
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch0
Analysis and Improvement of Policy Gradient Estimation0
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation0
Almost sure convergence rates of stochastic gradient methods under gradient domination0
Batch Policy Gradient Methods for Improving Neural Conversation Models0
All-Action Policy Gradient Methods: A Numerical Integration Approach0
AdaFrame: Adaptive Frame Selection for Fast Video Recognition0
Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning0
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Show:102550
← PrevPage 1 of 8Next →

No leaderboard results yet.