| Think-J: Learning to Think for Generative LLM-as-a-Judge | May 20, 2025 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization | May 19, 2025 | Offline RLPortfolio Optimization | —Unverified | 0 |
| Prior-Guided Diffusion Planning for Offline Reinforcement Learning | May 16, 2025 | Decision MakingDenoising | —Unverified | 0 |
| ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts | May 15, 2025 | Continual LearningLanguage Modeling | CodeCode Available | 1 |
| Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data | May 14, 2025 | Offline RLreinforcement-learning | —Unverified | 0 |
| Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL | May 13, 2025 | Offline RLSafe Reinforcement Learning | —Unverified | 0 |
| What Matters for Batch Online Reinforcement Learning in Robotics? | May 12, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains | May 12, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach | May 10, 2025 | Autonomous DrivingOffline RL | —Unverified | 0 |
| Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning | May 9, 2025 | D4RLOffline RL | —Unverified | 0 |