| Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models | May 5, 2025 | Policy Gradient MethodsRAG | CodeCode Available | 3 |
| Ekar: An Explainable Method for Knowledge Aware Recommendation | Jun 22, 2019 | Knowledge-Aware RecommendationKnowledge Graphs | CodeCode Available | 2 |
| Proximal Policy Optimization Algorithms | Jul 20, 2017 | Continuous ControlDota 2 | CodeCode Available | 2 |
| Efficient Diffusion Policies for Offline Reinforcement Learning | May 31, 2023 | D4RLOffline RL | CodeCode Available | 1 |
| Model-free Policy Learning with Reward Gradients | Mar 9, 2021 | Continuous Controlmodel | CodeCode Available | 1 |
| Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting | Jul 14, 2020 | Lifelong learningPolicy Gradient Methods | CodeCode Available | 1 |
| The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure | May 20, 2022 | Efficient ExplorationPolicy Gradient Methods | CodeCode Available | 1 |
| Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods | Feb 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Online Portfolio Management via Deep Reinforcement Learning with High-Frequency Data | May 1, 2023 | Deep Reinforcement LearningManagement | CodeCode Available | 1 |
| Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement | Mar 22, 2024 | Combinatorial OptimizationImitation Learning | CodeCode Available | 1 |
| Reevaluating Policy Gradient Methods for Imperfect-Information Games | Feb 13, 2025 | counterfactualDeep Reinforcement Learning | CodeCode Available | 1 |
| Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning | Jul 12, 2022 | Lifelong learningPolicy Gradient Methods | CodeCode Available | 1 |
| An Attentive Graph Agent for Topology-Adaptive Cyber Defence | Jan 24, 2025 | Graph AttentionGraph Neural Network | CodeCode Available | 1 |
| Experimental design for MRI by greedy policy search | Oct 30, 2020 | Experimental DesignPolicy Gradient Methods | CodeCode Available | 1 |
| Deep Bayesian Quadrature Policy Optimization | Jun 28, 2020 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning | Nov 4, 2018 | DecoderMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| Efficient Wasserstein Natural Gradients for Reinforcement Learning | Oct 12, 2020 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 1 |
| Episodic Policy Gradient Training | Dec 3, 2021 | Policy Gradient MethodsScheduling | CodeCode Available | 1 |
| Learning Opinion Summarizers by Selecting Informative Reviews | Sep 9, 2021 | Few-Shot LearningOpinion Summarization | CodeCode Available | 1 |
| Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization | Jun 20, 2023 | Deep Reinforcement LearningManagement | CodeCode Available | 1 |
| Partial advantage estimator for proximal policy optimization | Jan 26, 2023 | MuJoCoPolicy Gradient Methods | CodeCode Available | 1 |
| Self-critical Sequence Training for Image Captioning | Dec 2, 2016 | Image CaptioningPolicy Gradient Methods | CodeCode Available | 1 |
| Trust Region Policy Optimization | Feb 19, 2015 | Atari GamesPolicy Gradient Methods | CodeCode Available | 1 |
| Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent Design | Oct 7, 2021 | Decision MakingPolicy Gradient Methods | CodeCode Available | 1 |
| Continuous MDP Homomorphisms and Homomorphic Policy Gradient | Sep 15, 2022 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Policy Gradient Methods in the Presence of Symmetries and State Abstractions | May 9, 2023 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Distributional Policy Optimization: An Alternative Approach for Continuous Control | May 23, 2019 | continuous-controlContinuous Control | CodeCode Available | 1 |
| An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search | Dec 10, 2020 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Learning Multi-Agent Communication through Structured Attentive Reasoning | Dec 1, 2020 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 |
| Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization | Oct 3, 2022 | Decision MakingPolicy Gradient Methods | CodeCode Available | 1 |
| Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning | Jun 1, 2020 | Policy Gradient Methodsreinforcement-learning | CodeCode Available | 1 |
| Competitive Policy Optimization | Jun 18, 2020 | Policy Gradient Methods | CodeCode Available | 1 |
| StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs | Oct 10, 2024 | Information RetrievalPolicy Gradient Methods | CodeCode Available | 1 |
| Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers | Nov 22, 2024 | AvgDeep Reinforcement Learning | CodeCode Available | 1 |
| Divergence-Augmented Policy Optimization | Jan 25, 2025 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 1 |
| An Off-policy Policy Gradient Theorem Using Emphatic Weightings | Nov 22, 2018 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods | Nov 15, 2022 | Policy Gradient Methods | —Unverified | 0 |
| Momentum-Based Policy Gradient with Second-Order Information | May 17, 2022 | Policy Gradient Methods | —Unverified | 0 |
| Adaptive Batch Size for Safe Policy Gradients | Dec 1, 2017 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition | Dec 29, 2020 | Action RecognitionPolicy Gradient Methods | —Unverified | 0 |
| Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch | Mar 28, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Analysis and Improvement of Policy Gradient Estimation | Dec 1, 2011 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation | Jun 9, 2023 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Almost sure convergence rates of stochastic gradient methods under gradient domination | May 22, 2024 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| Batch Policy Gradient Methods for Improving Neural Conversation Models | Feb 10, 2017 | ChatbotPolicy Gradient Methods | —Unverified | 0 |
| All-Action Policy Gradient Methods: A Numerical Integration Approach | Oct 21, 2019 | Allcontinuous-control | —Unverified | 0 |
| AdaFrame: Adaptive Frame Selection for Fast Video Recognition | Nov 29, 2018 | Policy Gradient MethodsVideo Recognition | —Unverified | 0 |
| Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning | Feb 2, 2023 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient | Oct 27, 2020 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 |
| BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings | Nov 30, 2024 | Bayesian OptimizationPolicy Gradient Methods | —Unverified | 0 |