| Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization | May 25, 2024 | continuous-controlContinuous Control | CodeCode Available | 2 |
| Adaptive Q-Network: On-the-fly Target Selection for Deep Reinforcement Learning | May 25, 2024 | Atari GamesAutoML | —Unverified | 0 |
| Diffusion Actor-Critic with Entropy Regulator | May 24, 2024 | Decision MakingMuJoCo | CodeCode Available | 2 |
| Variational Delayed Policy Optimization | May 23, 2024 | MuJoCoReinforcement Learning (RL) | CodeCode Available | 0 |
| Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow | May 22, 2024 | IngenuityMuJoCo | CodeCode Available | 1 |
| Learning rigid-body simulators over implicit shapes for large-scale scenes and vision | May 22, 2024 | MuJoCo | —Unverified | 0 |
| Pure Planning to Pure Policies and In Between with a Recursive Tree Planner | May 21, 2024 | MuJoCo | —Unverified | 0 |
| Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? | May 20, 2024 | Atari GamesMamba | CodeCode Available | 0 |
| Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space | May 20, 2024 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Adaptive Exploration for Data-Efficient General Value Function Evaluations | May 13, 2024 | MuJoCo | CodeCode Available | 0 |
| Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline | May 4, 2024 | Computational EfficiencyMuJoCo | —Unverified | 0 |
| Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning | May 2, 2024 | Decision MakingMuJoCo | CodeCode Available | 0 |
| Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation | May 2, 2024 | MuJoCoReinforcement Learning (RL) | CodeCode Available | 5 |
| S^2AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic | May 2, 2024 | MuJoCoVariational Inference | CodeCode Available | 1 |
| MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure | May 1, 2024 | Efficient ExplorationMuJoCo | —Unverified | 0 |
| Markov flow policy -- deep MC | May 1, 2024 | MuJoCo | —Unverified | 0 |
| No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO | May 1, 2024 | MuJoCoReinforcement Learning (RL) | CodeCode Available | 1 |
| UCB-driven Utility Function Search for Multi-objective Reinforcement Learning | May 1, 2024 | Decision MakingMuJoCo | CodeCode Available | 1 |
| Closed Loop Interactive Embodied Reasoning for Robot Manipulation | Apr 23, 2024 | MuJoCoRobot Manipulation | —Unverified | 0 |
| Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis | Apr 9, 2024 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer | Apr 8, 2024 | MuJoCoPhysical Simulations | CodeCode Available | 5 |
| DIDA: Denoised Imitation Learning based on Domain Adaptation | Apr 4, 2024 | Domain AdaptationImitation Learning | —Unverified | 0 |
| Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process | Mar 25, 2024 | Active LearningMuJoCo | —Unverified | 0 |
| Robust Model Based Reinforcement Learning Using L_1 Adaptive Control | Mar 21, 2024 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 |
| A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization | Mar 17, 2024 | MuJoCo | —Unverified | 0 |