| Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier | May 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q^π-Realizability and Concentrability | May 27, 2024 | Computational EfficiencyOffline RL | —Unverified | 0 |
| OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators | May 27, 2024 | Decision MakingOffline RL | —Unverified | 0 |
| Exclusively Penalized Q-learning for Offline Reinforcement Learning | May 23, 2024 | Offline RLQ-Learning | —Unverified | 0 |
| Offline Reinforcement Learning from Datasets with Structured Non-Stationarity | May 23, 2024 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Offline RL via Feature-Occupancy Gradient Ascent | May 22, 2024 | Offline RL | —Unverified | 0 |
| Efficient Imitation Learning with Conservative World Models | May 21, 2024 | Imitation LearningOffline RL | —Unverified | 0 |
| Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? | May 20, 2024 | Atari GamesMamba | CodeCode Available | 0 |
| Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses | May 18, 2024 | D4RLOffline RL | —Unverified | 0 |
| Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning | May 12, 2024 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |