SOTAVerified

Policy Gradient Methods

Papers

Showing 125 of 382 papers

TitleStatusHype
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language ModelsCode3
Ekar: An Explainable Method for Knowledge Aware RecommendationCode2
Proximal Policy Optimization AlgorithmsCode2
Reevaluating Policy Gradient Methods for Imperfect-Information GamesCode1
Fine-Tuning Discrete Diffusion Models with Policy Gradient MethodsCode1
Divergence-Augmented Policy OptimizationCode1
An Attentive Graph Agent for Topology-Adaptive Cyber DefenceCode1
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay BuffersCode1
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMsCode1
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but ImprovementCode1
Neural Inventory Control in Networks via Hindsight Differentiable Policy OptimizationCode1
Efficient Diffusion Policies for Offline Reinforcement LearningCode1
Policy Gradient Methods in the Presence of Symmetries and State AbstractionsCode1
Online Portfolio Management via Deep Reinforcement Learning with High-Frequency DataCode1
Partial advantage estimator for proximal policy optimizationCode1
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy OptimizationCode1
Continuous MDP Homomorphisms and Homomorphic Policy GradientCode1
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement LearningCode1
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy MeasureCode1
Episodic Policy Gradient TrainingCode1
Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent DesignCode1
Learning Opinion Summarizers by Selecting Informative ReviewsCode1
Model-free Policy Learning with Reward GradientsCode1
An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy SearchCode1
Learning Multi-Agent Communication through Structured Attentive ReasoningCode1
Show:102550
← PrevPage 1 of 16Next →

No leaderboard results yet.