| End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient | Dec 7, 2017 | DecoderGoal-Oriented Dialog | —Unverified | 0 | 0 |
| ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization | Oct 2, 2024 | MuJoCoMulti-agent Reinforcement Learning | —Unverified | 0 | 0 |
| Enabling A Network AI Gym for Autonomous Cyber Agents | Apr 3, 2023 | Deep Reinforcement LearningOffline RL | —Unverified | 0 | 0 |
| Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Apr 15, 2024 | GPUOffline RL | —Unverified | 0 | 0 |
| Augmenting Offline RL with Unlabeled Data | Jun 11, 2024 | Offline RLTransfer Learning | —Unverified | 0 | 0 |
| EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL | Jul 21, 2020 | D4RLDecision Making | —Unverified | 0 | 0 |
| Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only | May 22, 2025 | Imitation LearningOffline RL | —Unverified | 0 | 0 |
| CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning | Jun 23, 2023 | Imitation LearningOffline RL | —Unverified | 0 | 0 |
| A Fast Convergence Theory for Offline Decision Making | Jun 3, 2024 | Decision MakingOffline RL | —Unverified | 0 | 0 |
| A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning | Nov 27, 2023 | Offline RLReinforcement Learning (RL) | —Unverified | 0 | 0 |
| ChiPFormer: Transferable Chip Placement via Offline Decision Transformer | Jun 26, 2023 | Offline RLReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Efficient Imitation Learning with Conservative World Models | May 21, 2024 | Imitation LearningOffline RL | —Unverified | 0 | 0 |
| Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings | May 13, 2021 | Offline RL | —Unverified | 0 | 0 |
| Dual Generator Offline Reinforcement Learning | Nov 2, 2022 | Offline RLreinforcement-learning | —Unverified | 0 | 0 |
| A Survey on Model-based Reinforcement Learning | Jun 19, 2022 | Decision Makingmodel | —Unverified | 0 | 0 |
| DRDT3: Diffusion-Refined Decision Test-Time Training Model | Jan 12, 2025 | D4RLOffline RL | —Unverified | 0 | 0 |
| DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization | Dec 9, 2021 | Atari GamesD4RL | —Unverified | 0 | 0 |
| CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning | Jun 11, 2024 | D4RLDenoising | —Unverified | 0 | 0 |
| Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage | May 16, 2023 | Offline RL | —Unverified | 0 | 0 |
| A Survey of Zero-shot Generalisation in Deep Reinforcement Learning | Nov 18, 2021 | Deep Reinforcement LearningOffline RL | —Unverified | 0 | 0 |
| DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning | Sep 16, 2023 | D4RLmodel | —Unverified | 0 | 0 |
| Domain Generalization for Robust Model-Based Offline Reinforcement Learning | Nov 27, 2022 | Domain GeneralizationOffline RL | —Unverified | 0 | 0 |
| Causal prompting model-based offline reinforcement learning | Jun 3, 2024 | modelOffline RL | —Unverified | 0 | 0 |
| A Strong Baseline for Batch Imitation Learning | Feb 6, 2023 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning | Jan 1, 2024 | continuous-controlContinuous Control | —Unverified | 0 | 0 |