| From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning | Jul 17, 2025 | D4RLOffline RL | —Unverified | 0 |
| Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Jul 15, 2025 | DiversityMMLU | CodeCode Available | 0 |
| Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning | Jul 8, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL | Jun 26, 2025 | Offline RL | —Unverified | 0 |
| Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Jun 26, 2025 | Action GenerationDecision Making | —Unverified | 0 |
| Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity | Jun 20, 2025 | continuous-controlContinuous Control | CodeCode Available | 0 |
| CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization | Jun 18, 2025 | D4RLOffline RL | CodeCode Available | 0 |
| IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards | Jun 17, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers | Jun 16, 2025 | Decision MakingDecision Making Under Uncertainty | —Unverified | 0 |
| DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty | Jun 14, 2025 | continuous-controlContinuous Control | CodeCode Available | 0 |
| MOORL: A Framework for Integrating Offline-Online Reinforcement Learning | Jun 11, 2025 | D4RLDeep Reinforcement Learning | —Unverified | 0 |
| Policy-Based Trajectory Clustering in Offline Reinforcement Learning | Jun 10, 2025 | ClusteringD4RL | —Unverified | 0 |
| Semi-gradient DICE for Offline Constrained Reinforcement Learning | Jun 10, 2025 | Offline RLOff-policy evaluation | —Unverified | 0 |
| MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning | Jun 10, 2025 | Data Augmentationmodel | CodeCode Available | 0 |
| Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood | Jun 10, 2025 | Computational EfficiencyD4RL | CodeCode Available | 0 |
| How to Provably Improve Return Conditioned Supervised Learning? | Jun 10, 2025 | Decision MakingOffline RL | —Unverified | 0 |
| Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation | Jun 9, 2025 | Decision MakingMuJoCo | —Unverified | 0 |
| Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning | Jun 8, 2025 | Offline RLQuestion Answering | —Unverified | 0 |
| ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning | May 29, 2025 | DenoisingMuJoCo | —Unverified | 0 |
| Enhanced DACER Algorithm with High Diffusion Efficiency | May 29, 2025 | DenoisingImitation Learning | —Unverified | 0 |
| Diffusion Guidance Is a Controllable Policy Improvement Operator | May 29, 2025 | Offline RL | CodeCode Available | 2 |
| SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning | May 28, 2025 | Offline RLreinforcement-learning | CodeCode Available | 0 |
| Scaling Offline RL via Efficient and Expressive Shortcut Models | May 28, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL | May 26, 2025 | D4RLOffline RL | CodeCode Available | 0 |
| GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning | May 24, 2025 | GPUOffline RL | —Unverified | 0 |
| Diffusion Self-Weighted Guidance for Offline Reinforcement Learning | May 23, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects | May 22, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only | May 22, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies | May 22, 2025 | Offline RLQ-Learning | —Unverified | 0 |
| Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | May 20, 2025 | MathOffline RL | —Unverified | 0 |
| Think-J: Learning to Think for Generative LLM-as-a-Judge | May 20, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization | May 19, 2025 | Offline RLPortfolio Optimization | —Unverified | 0 |
| Prior-Guided Diffusion Planning for Offline Reinforcement Learning | May 16, 2025 | Decision MakingDenoising | —Unverified | 0 |
| ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts | May 15, 2025 | Continual LearningLanguage Modeling | CodeCode Available | 1 |
| Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data | May 14, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL | May 13, 2025 | Offline RLSafe Reinforcement Learning | —Unverified | 0 |
| What Matters for Batch Online Reinforcement Learning in Robotics? | May 12, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains | May 12, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach | May 10, 2025 | Autonomous DrivingOffline RL | —Unverified | 0 |
| Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning | May 9, 2025 | D4RLOffline RL | —Unverified | 0 |
| Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach | May 8, 2025 | D4RLDecision Making | —Unverified | 0 |
| Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study | May 4, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning | May 3, 2025 | D4RLOffline RL | —Unverified | 0 |
| Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator | Apr 23, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning | Apr 16, 2025 | D4RLOffline RL | —Unverified | 0 |
| A Clean Slate for Offline Reinforcement Learning | Apr 15, 2025 | Offline RLreinforcement-learning | CodeCode Available | 3 |
| Towards Optimal Differentially Private Regret Bounds in Linear MDPs | Apr 12, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Decision SpikeFormer: Spike-Driven Transformer for Decision Making | Apr 4, 2025 | D4RLDecision Making | —Unverified | 0 |
| Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation | Mar 26, 2025 | D4RLData Augmentation | —Unverified | 0 |
| Offline Reinforcement Learning with Discrete Diffusion Skills | Mar 26, 2025 | DecoderOffline RL | —Unverified | 0 |