| Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL | May 26, 2025 | D4RLOffline RL | CodeCode Available | 0 |
| GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning | May 24, 2025 | GPUOffline RL | —Unverified | 0 |
| Diffusion Self-Weighted Guidance for Offline Reinforcement Learning | May 23, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only | May 22, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects | May 22, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies | May 22, 2025 | Offline RLQ-Learning | —Unverified | 0 |
| Think-J: Learning to Think for Generative LLM-as-a-Judge | May 20, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | May 20, 2025 | MathOffline RL | —Unverified | 0 |
| Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization | May 19, 2025 | Offline RLPortfolio Optimization | —Unverified | 0 |
| Prior-Guided Diffusion Planning for Offline Reinforcement Learning | May 16, 2025 | Decision MakingDenoising | —Unverified | 0 |