| Stabilizing Extreme Q-learning by Maclaurin Expansion | Jun 7, 2024 | D4RLOffline RL | CodeCode Available | 0 |
| Strategically Conservative Q-Learning | Jun 6, 2024 | D4RLOffline RL | CodeCode Available | 1 |
| Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models | Jun 6, 2024 | Offline RLreinforcement-learning | —Unverified | 0 |
| UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning | Jun 5, 2024 | D4RLOffline RL | —Unverified | 0 |
| A Fast Convergence Theory for Offline Decision Making | Jun 3, 2024 | Decision MakingOffline RL | —Unverified | 0 |
| Causal prompting model-based offline reinforcement learning | Jun 3, 2024 | modelOffline RL | —Unverified | 0 |
| Diffusion Policies creating a Trust Region for Offline Reinforcement Learning | May 30, 2024 | D4RLDenoising | CodeCode Available | 1 |
| Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory | May 29, 2024 | Imitation LearningOffline RL | —Unverified | 0 |
| Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning | May 29, 2024 | Offline RLreinforcement-learning | —Unverified | 0 |
| Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination | May 28, 2024 | Offline RLreinforcement-learning | CodeCode Available | 1 |