| End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient | Dec 7, 2017 | DecoderGoal-Oriented Dialog | —Unverified | 0 |
| ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization | Oct 2, 2024 | MuJoCoMulti-agent Reinforcement Learning | —Unverified | 0 |
| From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning | Jul 17, 2025 | D4RLOffline RL | —Unverified | 0 |
| Enabling A Network AI Gym for Autonomous Cyber Agents | Apr 3, 2023 | Deep Reinforcement LearningOffline RL | —Unverified | 0 |
| Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Apr 15, 2024 | GPUOffline RL | —Unverified | 0 |
| Augmenting Offline RL with Unlabeled Data | Jun 11, 2024 | Offline RLTransfer Learning | —Unverified | 0 |
| EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL | Jul 21, 2020 | D4RLDecision Making | —Unverified | 0 |
| CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning | Jun 23, 2023 | Imitation LearningOffline RL | —Unverified | 0 |
| Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only | May 22, 2025 | Imitation LearningOffline RL | —Unverified | 0 |
| A Fast Convergence Theory for Offline Decision Making | Jun 3, 2024 | Decision MakingOffline RL | —Unverified | 0 |