| What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? | May 30, 2023 | Imitation LearningOffline RL | CodeCode Available | 0 |
| Robust Reinforcement Learning Objectives for Sequential Recommender Systems | May 30, 2023 | Offline RLRecommendation Systems | CodeCode Available | 0 |
| Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism | May 29, 2023 | Decision MakingEconometrics | —Unverified | 0 |
| Beyond Reward: Offline Preference-guided Policy Optimization | May 25, 2023 | Offline RLreinforcement-learning | CodeCode Available | 0 |
| The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning | May 25, 2023 | Distributional Reinforcement LearningOffline RL | CodeCode Available | 0 |
| Offline Primal-Dual Reinforcement Learning for Linear MDPs | May 22, 2023 | Offline RLreinforcement-learning | —Unverified | 0 |
| Offline Reinforcement Learning with Additional Covering Distributions | May 22, 2023 | Inductive BiasOffline RL | —Unverified | 0 |
| Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models | May 18, 2023 | MuJoCoOffline RL | —Unverified | 0 |
| SLiC-HF: Sequence Likelihood Calibration with Human Feedback | May 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning | May 17, 2023 | Offline RLreinforcement-learning | —Unverified | 0 |