| Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network | Nov 1, 2020 | Model-based Reinforcement Learningreinforcement-learning | —Unverified | 0 | 0 |
| TD-M(PC)^2: Improving Temporal Difference MPC Through Policy Constraint | Feb 5, 2025 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning | May 19, 2025 | D4RLModel-based Reinforcement Learning | —Unverified | 0 | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 | 0 |
| The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach | Jul 12, 2018 | Deep Reinforcement LearningModel-based Reinforcement Learning | —Unverified | 0 | 0 |
| The detour problem in a stochastic environment: Tolman revisited | Sep 27, 2017 | Model-based Reinforcement LearningReinforcement Learning | —Unverified | 0 | 0 |
| The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces | Jun 5, 2018 | Atari GamesModel-based Reinforcement Learning | —Unverified | 0 | 0 |
| The growth and form of knowledge networks by kinesthetic curiosity | Jun 4, 2020 | FormModel-based Reinforcement Learning | —Unverified | 0 | 0 |
| The Interpretability of Codebooks in Model-Based Reinforcement Learning is Limited | Jul 28, 2024 | Deep Reinforcement LearningDisentanglement | —Unverified | 0 | 0 |
| Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning | Jul 24, 2023 | continuous-controlContinuous Control | —Unverified | 0 | 0 |