| Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL | Dec 25, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 0 |
| Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization | Dec 24, 2024 | Offline RLReinforcement Learning (RL) | —Unverified | 0 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |
| AdaCred: Adaptive Causal Decision Transformers with Feature Crediting | Dec 19, 2024 | AttributeImitation Learning | —Unverified | 0 |
| Are Expressive Models Truly Necessary for Offline RL? | Dec 15, 2024 | D4RLOffline RL | CodeCode Available | 1 |
| In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning | Dec 12, 2024 | Offline RL | CodeCode Available | 1 |
| Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning | Dec 11, 2024 | Autonomous DrivingOffline RL | CodeCode Available | 0 |
| Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data | Dec 10, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 2 |
| Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone | Dec 9, 2024 | global-optimizationImitation Learning | —Unverified | 0 |
| Reinforcement Learning: An Overview | Dec 6, 2024 | Decision MakingDeep Reinforcement Learning | —Unverified | 0 |