| Differentiable Tree Search Network | Jan 22, 2024 | Decision MakingInductive Bias | CodeCode Available | 5 |
| A Clean Slate for Offline Reinforcement Learning | Apr 15, 2025 | Offline RLreinforcement-learning | CodeCode Available | 3 |
| Flow Q-Learning | Feb 4, 2025 | Action GenerationD4RL | CodeCode Available | 3 |
| DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning | Jun 14, 2024 | Offline RL | CodeCode Available | 3 |
| Is Value Learning Really the Main Bottleneck in Offline RL? | Jun 13, 2024 | Imitation LearningOffline RL | CodeCode Available | 3 |
| Diffusion Guidance Is a Controllable Policy Improvement Operator | May 29, 2025 | Offline RL | CodeCode Available | 2 |
| What Makes a Good Diffusion Planner for Decision Making? | Mar 1, 2025 | Action GenerationDecision Making | CodeCode Available | 2 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |
| Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data | Dec 10, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 2 |
| Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective | Dec 2, 2024 | Density EstimationOffline RL | CodeCode Available | 2 |