| ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy | Feb 8, 2025 | Q-LearningSafe Exploration | CodeCode Available | 3 |
| Flow Q-Learning | Feb 4, 2025 | Action GenerationD4RL | CodeCode Available | 3 |
| Streaming Deep Reinforcement Learning Finally Works | Oct 18, 2024 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 3 |
| Simplifying Deep Temporal Difference Learning | Jul 5, 2024 | Q-LearningReinforcement Learning (RL) | CodeCode Available | 3 |
| Digi-Q: Learning Q-Value Functions for Training Device-Control Agents | Feb 13, 2025 | Q-LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading | Nov 26, 2024 | Offline RLparameter-efficient fine-tuning | CodeCode Available | 2 |
| Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather | Jul 2, 2024 | Data AugmentationLIDAR Semantic Segmentation | CodeCode Available | 2 |
| Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving | May 28, 2024 | Autonomous DrivingBilevel Optimization | CodeCode Available | 2 |
| Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding | Mar 12, 2024 | Multi-Agent Path FindingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning | Mar 2, 2024 | DecoderMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency | Nov 29, 2022 | Decision MakingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning | Aug 12, 2022 | D4RLOffline RL | CodeCode Available | 2 |
| Offline RL for Natural Language Generation with Implicit Language Q Learning | Jun 5, 2022 | Language ModellingOffline RL | CodeCode Available | 2 |
| rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch | Sep 3, 2019 | Deep Reinforcement LearningQ-Learning | CodeCode Available | 2 |
| POPGym Arcade: Parallel Pixelated POMDPs | Mar 3, 2025 | counterfactualImitation Learning | CodeCode Available | 1 |
| Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone Connectivity | Oct 31, 2024 | MuJoCoQ-Learning | CodeCode Available | 1 |
| Reward-free World Models for Online Imitation Learning | Oct 17, 2024 | Imitation LearningQ-Learning | CodeCode Available | 1 |
| Reinforcement Learning in High-frequency Market Making | Jul 14, 2024 | Q-Learningreinforcement-learning | CodeCode Available | 1 |
| Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation | Jul 4, 2024 | Q-Learningreinforcement-learning | CodeCode Available | 1 |
| PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer | Jun 10, 2024 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Strategically Conservative Q-Learning | Jun 6, 2024 | D4RLOffline RL | CodeCode Available | 1 |
| Towards Universal and Black-Box Query-Response Only Attack on LLMs with QROA | Jun 4, 2024 | Q-Learning | CodeCode Available | 1 |
| Diffusion Policies creating a Trust Region for Offline Reinforcement Learning | May 30, 2024 | D4RLDenoising | CodeCode Available | 1 |
| A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning | May 27, 2024 | Data AugmentationQ-Learning | CodeCode Available | 1 |
| Research on Robot Path Planning Based on Reinforcement Learning | Apr 22, 2024 | Q-Learningreinforcement-learning | CodeCode Available | 1 |
| Laser Learning Environment: A new environment for coordination-critical multi-agent tasks | Apr 4, 2024 | Multi-agent Reinforcement LearningQ-Learning | CodeCode Available | 1 |
| Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error | Feb 3, 2024 | Adversarial RobustnessDeep Reinforcement Learning | CodeCode Available | 1 |
| Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator | Dec 8, 2023 | Model Predictive ControlMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| Optimistic Multi-Agent Policy Gradient | Nov 3, 2023 | MuJoCoQ-Learning | CodeCode Available | 1 |
| Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning | Oct 30, 2023 | Decision MakingOffline RL | CodeCode Available | 1 |
| Towards Robust Offline Reinforcement Learning under Diverse Data Corruption | Oct 19, 2023 | Offline RLQ-Learning | CodeCode Available | 1 |
| Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissions | Oct 19, 2023 | Deep Reinforcement LearningQ-Learning | CodeCode Available | 1 |
| Boosting Continuous Control with Consistency Policy | Oct 10, 2023 | continuous-controlContinuous Control | CodeCode Available | 1 |
| PGDQN: Preference-Guided Deep Q-Network | Oct 3, 2023 | Atari GamesBenchmarking | CodeCode Available | 1 |
| Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning | Sep 22, 2023 | counterfactualMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| Reasoning with Latent Diffusion in Offline Reinforcement Learning | Sep 12, 2023 | D4RLOffline RL | CodeCode Available | 1 |
| Robust Multi-Agent Reinforcement Learning with State Uncertainty | Jul 30, 2023 | Multi-agent Reinforcement LearningQ-Learning | CodeCode Available | 1 |
| MADiff: Offline Multi-agent Learning with Diffusion Models | May 27, 2023 | Offline RLQ-Learning | CodeCode Available | 1 |
| When should we prefer Decision Transformers for Offline Reinforcement Learning? | May 23, 2023 | D4RLImitation Learning | CodeCode Available | 1 |
| IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies | Apr 20, 2023 | Offline RLQ-Learning | CodeCode Available | 1 |
| Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization | Mar 28, 2023 | D4RLOffline RL | CodeCode Available | 1 |
| Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning | Mar 9, 2023 | Offline RLQ-Learning | CodeCode Available | 1 |
| LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning | Mar 1, 2023 | Continuous ControlImitation Learning | CodeCode Available | 1 |
| TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems | Jan 13, 2023 | Multi-agent Reinforcement LearningQ-Learning | CodeCode Available | 1 |
| Extreme Q-Learning: MaxEnt RL without Entropy | Jan 5, 2023 | D4RLDeep Reinforcement Learning | CodeCode Available | 1 |
| Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver | Jan 5, 2023 | Graph Neural NetworkQ-Learning | CodeCode Available | 1 |
| Solving Continuous Control via Q-learning | Oct 22, 2022 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Sustainable Online Reinforcement Learning for Auto-bidding | Oct 13, 2022 | Q-Learningreinforcement-learning | CodeCode Available | 1 |
| Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient | Oct 13, 2022 | Montezuma's RevengeQ-Learning | CodeCode Available | 1 |
| Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials | Oct 11, 2022 | Offline RLQ-Learning | CodeCode Available | 1 |