| Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | May 20, 2025 | MathOffline RL | —Unverified | 0 | 0 |
| Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents | Apr 3, 2023 | Deep Reinforcement LearningOffline RL | —Unverified | 0 | 0 |
| Unsupervised-to-Online Reinforcement Learning | Aug 27, 2024 | Offline RLreinforcement-learning | —Unverified | 0 | 0 |
| Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing | Jun 20, 2024 | Autonomous DrivingData Augmentation | —Unverified | 0 | 0 |
| User-Interactive Offline Reinforcement Learning | May 21, 2022 | Offline RLreinforcement-learning | —Unverified | 0 | 0 |
| Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning | Feb 3, 2024 | Offline RLReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Value Penalized Q-Learning for Recommender Systems | Oct 15, 2021 | Offline RLQ-Learning | —Unverified | 0 | 0 |
| Variational oracle guiding for reinforcement learning | Sep 29, 2021 | Decision MakingDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach | May 10, 2025 | Autonomous DrivingOffline RL | —Unverified | 0 | 0 |
| VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning | Apr 16, 2025 | D4RLOffline RL | —Unverified | 0 | 0 |