| Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline | May 4, 2024 | Computational EfficiencyMuJoCo | —Unverified | 0 |
| Off-Policy Deep Reinforcement Learning Algorithms for Handling Various Robotic Manipulator Tasks | Dec 11, 2022 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| One is More: Diverse Perspectives within a Single Network for Efficient DRL | Oct 21, 2023 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| On-Policy Model Errors in Reinforcement Learning | Oct 15, 2021 | modelMuJoCo | —Unverified | 0 |
| On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling | Nov 14, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| On Proximal Policy Optimization's Heavy-tailed Gradients | Feb 20, 2021 | continuous-controlContinuous Control | —Unverified | 0 |
| On Representation Complexity of Model-based and Model-free Reinforcement Learning | Oct 3, 2023 | modelMuJoCo | —Unverified | 0 |
| On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies | Sep 21, 2022 | continuous-controlContinuous Control | —Unverified | 0 |
| On the Geometry of Reinforcement Learning in Continuous State and Action Spaces | Dec 29, 2022 | MuJoCoreinforcement-learning | —Unverified | 0 |
| OPAC: Opportunistic Actor-Critic | Dec 11, 2020 | continuous-controlContinuous Control | —Unverified | 0 |
| OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning | Sep 29, 2021 | MuJoCoreinforcement-learning | —Unverified | 0 |
| OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments | Dec 19, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| Overcoming Model Bias for Robust Offline Deep Reinforcement Learning | Aug 12, 2020 | continuous-controlContinuous Control | —Unverified | 0 |
| Parareal with a Learned Coarse Model for Robotic Manipulation | Dec 12, 2019 | MuJoCo | —Unverified | 0 |
| Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning | Apr 22, 2020 | MuJoCoreinforcement-learning | —Unverified | 0 |
| PGPS : Coupling Policy Gradient with Population-based Search | Jan 1, 2021 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| Phasic Diversity Optimization for Population-Based Reinforcement Learning | Mar 17, 2024 | DiversityMuJoCo | —Unverified | 0 |
| Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning | May 19, 2025 | D4RLmodel | —Unverified | 0 |
| Policy Gradient with Kernel Quadrature | Oct 23, 2023 | Causal DiscoveryMuJoCo | —Unverified | 0 |
| Policy Gradient With Serial Markov Chain Reasoning | Oct 13, 2022 | Decision MakingMuJoCo | —Unverified | 0 |
| Policy Optimization by Genetic Distillation | Nov 3, 2017 | Deep Reinforcement LearningImitation Learning | —Unverified | 0 |
| Certifiably Robust Reinforcement Learning through Model-Based Abstract Interpretation | Jan 26, 2023 | Adversarial RobustnessMuJoCo | —Unverified | 0 |
| Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space | Sep 15, 2019 | continuous-controlContinuous Control | —Unverified | 0 |
| Policy Search by Target Distribution Learning for Continuous Control | May 27, 2019 | continuous-controlContinuous Control | —Unverified | 0 |
| Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL | Oct 23, 2021 | Model Predictive ControlMuJoCo | —Unverified | 0 |