| Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback | Jun 20, 2023 | MuJoCoQ-Learning | —Unverified | 0 |
| Evolutionary Strategy Guided Reinforcement Learning via MultiBuffer Communication | Jun 20, 2023 | Deep Reinforcement LearningEvolutionary Algorithms | —Unverified | 0 |
| Surfer: Progressive Reasoning with World Models for Robotic Manipulation | Jun 20, 2023 | Decision MakingMuJoCo | —Unverified | 0 |
| Maximum Entropy Heterogeneous-Agent Reinforcement Learning | Jun 19, 2023 | MuJoCoMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents | Jun 19, 2023 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Mimicking Better by Matching the Approximate Action Distribution | Jun 16, 2023 | Imitation LearningMuJoCo | CodeCode Available | 0 |
| Recurrent Action Transformer with Memory | Jun 15, 2023 | Atari GamesMuJoCo | CodeCode Available | 0 |
| Language to Rewards for Robotic Skill Synthesis | Jun 14, 2023 | In-Context LearningLogical Reasoning | —Unverified | 0 |
| Robust Reinforcement Learning through Efficient Adversarial Herding | Jun 12, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| Mildly Constrained Evaluation Policy for Offline Reinforcement Learning | Jun 6, 2023 | D4RLMuJoCo | CodeCode Available | 0 |
| ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages | Jun 2, 2023 | Bayesian Inferencecontinuous-control | CodeCode Available | 0 |
| MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL | May 31, 2023 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration | May 29, 2023 | MuJoCo | CodeCode Available | 1 |
| A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem | May 26, 2023 | MuJoCoMulti-agent Reinforcement Learning | —Unverified | 0 |
| Inverse Reinforcement Learning with the Average Reward Criterion | May 24, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| OER: Offline Experience Replay for Continual Offline Reinforcement Learning | May 23, 2023 | Continual LearningMuJoCo | —Unverified | 0 |
| Policy Representation via Diffusion Probability Model for Reinforcement Learning | May 22, 2023 | continuous-controlContinuous Control | CodeCode Available | 1 |
| TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching | May 22, 2023 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 |
| Unsupervised Discovery of Continuous Skills on a Sphere | May 21, 2023 | MuJoCoUnsupervised Reinforcement Learning | —Unverified | 0 |
| Off-Policy Average Reward Actor-Critic with Deterministic Policy Search | May 20, 2023 | MuJoCo | CodeCode Available | 0 |
| Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models | May 18, 2023 | MuJoCoOffline RL | —Unverified | 0 |
| Client Selection for Federated Policy Optimization with Environment Heterogeneity | May 18, 2023 | MuJoCoPolicy Gradient Methods | CodeCode Available | 0 |
| Coagent Networks: Generalized and Scaled | May 16, 2023 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback | May 13, 2023 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing RL Safety | May 8, 2023 | MuJoCo | —Unverified | 0 |