| PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects | May 22, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Think-J: Learning to Think for Generative LLM-as-a-Judge | May 20, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | May 20, 2025 | MathOffline RL | —Unverified | 0 |
| Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization | May 19, 2025 | Offline RLPortfolio Optimization | —Unverified | 0 |
| Prior-Guided Diffusion Planning for Offline Reinforcement Learning | May 16, 2025 | Decision MakingDenoising | —Unverified | 0 |
| Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data | May 14, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL | May 13, 2025 | Offline RLSafe Reinforcement Learning | —Unverified | 0 |
| What Matters for Batch Online Reinforcement Learning in Robotics? | May 12, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains | May 12, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach | May 10, 2025 | Autonomous DrivingOffline RL | —Unverified | 0 |
| Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning | May 9, 2025 | D4RLOffline RL | —Unverified | 0 |
| Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach | May 8, 2025 | D4RLDecision Making | —Unverified | 0 |
| Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study | May 4, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning | May 3, 2025 | D4RLOffline RL | —Unverified | 0 |
| Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator | Apr 23, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning | Apr 16, 2025 | D4RLOffline RL | —Unverified | 0 |
| Towards Optimal Differentially Private Regret Bounds in Linear MDPs | Apr 12, 2025 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Decision SpikeFormer: Spike-Driven Transformer for Decision Making | Apr 4, 2025 | D4RLDecision Making | —Unverified | 0 |
| Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation | Mar 26, 2025 | D4RLData Augmentation | —Unverified | 0 |
| Offline Reinforcement Learning with Discrete Diffusion Skills | Mar 26, 2025 | DecoderOffline RL | —Unverified | 0 |
| Behaviour Discovery and Attribution for Explainable Reinforcement Learning | Mar 19, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Evaluation-Time Policy Switching for Offline Reinforcement Learning | Mar 15, 2025 | Behavioural cloningOffline RL | —Unverified | 0 |
| The Pitfalls of Imitation Learning when Actions are Continuous | Mar 12, 2025 | ChunkingImitation Learning | —Unverified | 0 |
| Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning | Mar 10, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| Policy Constraint by Only Support Constraint for Offline Reinforcement Learning | Mar 7, 2025 | Offline RLreinforcement-learning | CodeCode Available | 0 |