| Differentiable Tree Search Network | Jan 22, 2024 | Decision MakingInductive Bias | CodeCode Available | 5 |
| A Clean Slate for Offline Reinforcement Learning | Apr 15, 2025 | Offline RLreinforcement-learning | CodeCode Available | 3 |
| Flow Q-Learning | Feb 4, 2025 | Action GenerationD4RL | CodeCode Available | 3 |
| DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning | Jun 14, 2024 | Offline RL | CodeCode Available | 3 |
| Is Value Learning Really the Main Bottleneck in Offline RL? | Jun 13, 2024 | Imitation LearningOffline RL | CodeCode Available | 3 |
| Diffusion Guidance Is a Controllable Policy Improvement Operator | May 29, 2025 | Offline RL | CodeCode Available | 2 |
| What Makes a Good Diffusion Planner for Decision Making? | Mar 1, 2025 | Action GenerationDecision Making | CodeCode Available | 2 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |
| Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data | Dec 10, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 2 |
| Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective | Dec 2, 2024 | Density EstimationOffline RL | CodeCode Available | 2 |
| Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading | Nov 26, 2024 | Offline RLparameter-efficient fine-tuning | CodeCode Available | 2 |
| LongReward: Improving Long-context Large Language Models with AI Feedback | Oct 28, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 2 |
| Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization | Sep 2, 2024 | DiversityOffline RL | CodeCode Available | 2 |
| Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks | Aug 20, 2024 | Multi-agent Reinforcement LearningMulti-Task Learning | CodeCode Available | 2 |
| A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data | Jul 23, 2024 | Autonomous DrivingAutonomous Racing | CodeCode Available | 2 |
| Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning | May 27, 2024 | Gym halfcheetah-mediumGym halfcheetah-medium-expert | CodeCode Available | 2 |
| Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization | May 25, 2024 | continuous-controlContinuous Control | CodeCode Available | 2 |
| Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings | Feb 27, 2024 | DiversityOffline RL | CodeCode Available | 2 |
| Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions | Feb 21, 2024 | Decision MakingImitation Learning | CodeCode Available | 2 |
| Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model | Jan 19, 2024 | Offline RLreinforcement-learning | CodeCode Available | 2 |
| AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning | Aug 7, 2023 | Offline RLreinforcement-learning | CodeCode Available | 2 |
| FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation | May 22, 2023 | Imitation LearningMotion Planning | CodeCode Available | 2 |
| Dungeons and Data: A Large-Scale NetHack Dataset | Nov 1, 2022 | Decision MakingNetHack | CodeCode Available | 2 |
| Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning | Aug 12, 2022 | D4RLOffline RL | CodeCode Available | 2 |
| Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning | Jun 17, 2022 | Few-Shot LearningOffline RL | CodeCode Available | 2 |
| Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | Jun 9, 2022 | Benchmarkingcontinuous-control | CodeCode Available | 2 |
| Offline RL for Natural Language Generation with Implicit Language Q Learning | Jun 5, 2022 | Language ModellingOffline RL | CodeCode Available | 2 |
| CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning | Apr 18, 2022 | ChatbotOffline RL | CodeCode Available | 2 |
| VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning | Feb 17, 2022 | Deep Reinforcement LearningOffline RL | CodeCode Available | 2 |
| Flowformer: Linearizing Transformers with Conservation Flows | Feb 13, 2022 | D4RLOffline RL | CodeCode Available | 2 |
| Rethinking Attention with Performers | Sep 30, 2020 | D4RLImage Generation | CodeCode Available | 2 |
| D4RL: Datasets for Deep Data-Driven Reinforcement Learning | Apr 15, 2020 | D4RLOffline RL | CodeCode Available | 2 |
| Reformer: The Efficient Transformer | Jan 13, 2020 | D4RLImage Generation | CodeCode Available | 2 |
| ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts | May 15, 2025 | Continual LearningLanguage Modeling | CodeCode Available | 1 |
| NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios | Mar 25, 2025 | BenchmarkingOffline RL | CodeCode Available | 1 |
| GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments | Feb 3, 2025 | Efficient ExplorationGraph Neural Network | CodeCode Available | 1 |
| Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning | Dec 25, 2024 | Decision MakingOffline RL | CodeCode Available | 1 |
| Are Expressive Models Truly Necessary for Offline RL? | Dec 15, 2024 | D4RLOffline RL | CodeCode Available | 1 |
| In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning | Dec 12, 2024 | Offline RL | CodeCode Available | 1 |
| Doubly Mild Generalization for Offline Reinforcement Learning | Nov 12, 2024 | MuJoCoOffline RL | CodeCode Available | 1 |
| Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression | Oct 25, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 1 |
| Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance | Oct 17, 2024 | Offline RLRe-Ranking | CodeCode Available | 1 |
| Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining | Oct 1, 2024 | Atari Gamesmodel | CodeCode Available | 1 |
| DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors | Sep 26, 2024 | continuous-controlContinuous Control | CodeCode Available | 1 |
| PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer | Jun 10, 2024 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Strategically Conservative Q-Learning | Jun 6, 2024 | D4RLOffline RL | CodeCode Available | 1 |
| Diffusion Policies creating a Trust Region for Offline Reinforcement Learning | May 30, 2024 | D4RLDenoising | CodeCode Available | 1 |
| Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination | May 28, 2024 | Offline RLreinforcement-learning | CodeCode Available | 1 |
| Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL | May 28, 2024 | Offline RLReinforcement Learning (RL) | CodeCode Available | 1 |
| Q-value Regularized Transformer for Offline Reinforcement Learning | May 27, 2024 | D4RLOffline RL | CodeCode Available | 1 |