| From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning | Jul 17, 2025 | D4RLOffline RL | —Unverified | 0 |
| Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Jul 15, 2025 | DiversityMMLU | CodeCode Available | 0 |
| Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning | Jul 8, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Jun 26, 2025 | Action GenerationDecision Making | —Unverified | 0 |
| Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL | Jun 26, 2025 | Offline RL | —Unverified | 0 |
| Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity | Jun 20, 2025 | continuous-controlContinuous Control | CodeCode Available | 0 |
| CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization | Jun 18, 2025 | D4RLOffline RL | CodeCode Available | 0 |
| IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards | Jun 17, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers | Jun 16, 2025 | Decision MakingDecision Making Under Uncertainty | —Unverified | 0 |
| DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty | Jun 14, 2025 | continuous-controlContinuous Control | CodeCode Available | 0 |
| MOORL: A Framework for Integrating Offline-Online Reinforcement Learning | Jun 11, 2025 | D4RLDeep Reinforcement Learning | —Unverified | 0 |
| Policy-Based Trajectory Clustering in Offline Reinforcement Learning | Jun 10, 2025 | ClusteringD4RL | —Unverified | 0 |
| MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning | Jun 10, 2025 | Data Augmentationmodel | CodeCode Available | 0 |
| Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood | Jun 10, 2025 | Computational EfficiencyD4RL | CodeCode Available | 0 |
| Semi-gradient DICE for Offline Constrained Reinforcement Learning | Jun 10, 2025 | Offline RLOff-policy evaluation | —Unverified | 0 |
| How to Provably Improve Return Conditioned Supervised Learning? | Jun 10, 2025 | Decision MakingOffline RL | —Unverified | 0 |
| Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation | Jun 9, 2025 | Decision MakingMuJoCo | —Unverified | 0 |
| Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning | Jun 8, 2025 | Offline RLQuestion Answering | —Unverified | 0 |
| ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning | May 29, 2025 | DenoisingMuJoCo | —Unverified | 0 |
| Enhanced DACER Algorithm with High Diffusion Efficiency | May 29, 2025 | DenoisingImitation Learning | —Unverified | 0 |
| Diffusion Guidance Is a Controllable Policy Improvement Operator | May 29, 2025 | Offline RL | CodeCode Available | 2 |
| SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning | May 28, 2025 | Offline RLreinforcement-learning | CodeCode Available | 0 |
| Scaling Offline RL via Efficient and Expressive Shortcut Models | May 28, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL | May 26, 2025 | D4RLOffline RL | CodeCode Available | 0 |
| GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning | May 24, 2025 | GPUOffline RL | —Unverified | 0 |