| Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction | Jan 2, 2024 | MuJoCoPolicy Gradient Methods | —Unverified | 0 |
| Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study | Jul 8, 2025 | MuJoCoRecommendation Systems | —Unverified | 0 |
| Fast Convergence of Softmax Policy Mirror Ascent | Nov 18, 2024 | MuJoCo | —Unverified | 0 |
| FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control | May 28, 2025 | GPUHumanoid Control | —Unverified | 0 |
| A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning | Dec 12, 2023 | MuJoCoOffline RL | —Unverified | 0 |
| Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments | Jul 19, 2022 | MuJoCoreinforcement-learning | —Unverified | 0 |
| C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory | Feb 26, 2024 | Imitation LearningMuJoCo | —Unverified | 0 |
| Fight fire with fire: countering bad shortcuts in imitation learning with good shortcuts | Sep 29, 2021 | Autonomous Drivingcontinuous-control | —Unverified | 0 |
| Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming | Jun 22, 2022 | Autonomous DrivingClassification | —Unverified | 0 |
| Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback | Jul 17, 2025 | EEGMuJoCo | —Unverified | 0 |