| Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction | Jan 2, 2024 | MuJoCoPolicy Gradient Methods | —Unverified | 0 |
| Adaptive trajectory-constrained exploration strategy for deep reinforcement learning | Dec 27, 2023 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Efficient Reinforcement Learning via Decoupling Exploration and Utilization | Dec 26, 2023 | Autonomous VehiclesMuJoCo | CodeCode Available | 1 |
| XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library | Dec 25, 2023 | CPUDeep Reinforcement Learning | CodeCode Available | 3 |
| DexDLO: Learning Goal-Conditioned Dexterous Policy for Dynamic Manipulation of Deformable Linear Objects | Dec 23, 2023 | MuJoCoPosition | —Unverified | 0 |
| OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments | Dec 19, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation | Dec 17, 2023 | Imitation LearningMuJoCo | CodeCode Available | 0 |
| Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation | Dec 15, 2023 | Data AugmentationMuJoCo | —Unverified | 0 |
| World Models via Policy-Guided Trajectory Diffusion | Dec 13, 2023 | continuous-controlContinuous Control | CodeCode Available | 1 |
| A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning | Dec 12, 2023 | MuJoCoOffline RL | —Unverified | 0 |
| A dynamical clipping approach with task feedback for Proximal Policy Optimization | Dec 12, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Similarity-based Knowledge Transfer for Cross-Domain Reinforcement Learning | Dec 5, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| Supported Trust Region Optimization for Offline Reinforcement Learning | Nov 15, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling | Nov 14, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning | Nov 11, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| Optimistic Multi-Agent Policy Gradient | Nov 3, 2023 | MuJoCoQ-Learning | CodeCode Available | 1 |
| Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula | Nov 3, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| A Tractable Inference Perspective of Offline RL | Oct 31, 2023 | MuJoCoOffline RL | —Unverified | 0 |
| Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations | Oct 24, 2023 | Imitation LearningMuJoCo | —Unverified | 0 |
| Mind the Model, Not the Agent: The Primacy Bias in Model-based RL | Oct 23, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| Policy Gradient with Kernel Quadrature | Oct 23, 2023 | Causal DiscoveryMuJoCo | —Unverified | 0 |
| One is More: Diverse Perspectives within a Single Network for Efficient DRL | Oct 21, 2023 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | Oct 19, 2023 | MuJoCoPrompt Engineering | CodeCode Available | 1 |
| Benchmarking the Sim-to-Real Gap in Cloth Manipulation | Oct 14, 2023 | BenchmarkingMuJoCo | —Unverified | 0 |
| LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios | Oct 12, 2023 | Board GamesDecision Making | —Unverified | 0 |