| Learning Self-Imitating Diverse Policies | May 25, 2018 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration | Jul 30, 2018 | Deep Reinforcement LearningEfficient Exploration | —Unverified | 0 | 0 |
| Actor-Critic Reinforcement Learning with Phased Actor | Apr 18, 2024 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions | Jun 16, 2024 | Multi-Armed BanditsPolicy Gradient Methods | —Unverified | 0 | 0 |
| DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning | Sep 18, 2019 | Deep Reinforcement LearningMotion Planning | —Unverified | 0 | 0 |
| Improving DAPO from a Mixed-Policy Perspective | Jul 17, 2025 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Policy Gradient Methods for Distortion Risk Measures | Jul 9, 2021 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| Linear convergence of a policy gradient method for some finite horizon continuous time control problems | Mar 22, 2022 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm | Dec 28, 2022 | ChatbotDeep Reinforcement Learning | —Unverified | 0 | 0 |
| A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals | Feb 14, 2025 | Policy Gradient Methods | —Unverified | 0 | 0 |