| Manifold Regularization for Kernelized LSTD | Oct 15, 2017 | Policy Gradient MethodsReinforcement Learning | —Unverified | 0 |
| Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods | Nov 4, 2020 | Deep Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 |
| Learning to Constrain Policy Optimization with Virtual Trust Region | Apr 20, 2022 | Atari GamesPolicy Gradient Methods | —Unverified | 0 |
| Meta Learning the Step Size in Policy Gradient Methods | May 20, 2021 | Meta-LearningMeta Reinforcement Learning | —Unverified | 0 |
| Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation | Feb 2, 2025 | Policy Gradient Methods | —Unverified | 0 |
| Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment | Jun 28, 2021 | Decision MakingPolicy Gradient Methods | —Unverified | 0 |
| Mollification Effects of Policy Gradient Methods | May 28, 2024 | continuous-controlContinuous Control | —Unverified | 0 |
| Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach | Mar 29, 2022 | Hierarchical Reinforcement LearningMulti-agent Reinforcement Learning | —Unverified | 0 |
| Multiagent Soft Q-Learning | Apr 25, 2018 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles | Sep 7, 2019 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |