| Predicting Multiple Actions for Stochastic Continuous Control | Jan 1, 2018 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| On the Second-Order Convergence of Biased Policy Gradient Algorithms | Nov 5, 2023 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains | Dec 9, 2023 | Multi-agent Reinforcement LearningPolicy Gradient Methods | —Unverified | 0 | 0 |
| Programmatic Reinforcement Learning without Oracles | Sep 29, 2021 | Bilevel OptimizationDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Provable Policy Gradient Methods for Average-Reward Markov Potential Games | Mar 9, 2024 | Policy Gradient Methods | —Unverified | 0 | 0 |
| Provably Convergent Policy Optimization via Metric-aware Trust Region Methods | Jun 25, 2023 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games | Feb 17, 2021 | Policy Gradient MethodsVocal Bursts Valence Prediction | —Unverified | 0 | 0 |
| Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information | Jul 20, 2021 | Policy Gradient Methodsreinforcement-learning | —Unverified | 0 | 0 |
| Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution | Nov 3, 2021 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | Nov 7, 2024 | Offline RLPolicy Gradient Methods | —Unverified | 0 | 0 |