| Phasic Diversity Optimization for Population-Based Reinforcement Learning | Mar 17, 2024 | DiversityMuJoCo | —Unverified | 0 |
| Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning | Mar 12, 2024 | continuous-controlContinuous Control | —Unverified | 0 |
| DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning | Mar 11, 2024 | Model Predictive ControlMuJoCo | —Unverified | 0 |
| Conservative DDPG -- Pessimistic RL without Ensemble | Mar 8, 2024 | MuJoCo | —Unverified | 0 |
| Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning | Mar 4, 2024 | Atari Gamescontinuous-control | —Unverified | 0 |
| Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) | Mar 2, 2024 | Imitation LearningMuJoCo | —Unverified | 0 |
| Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency | Mar 1, 2024 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory | Feb 26, 2024 | Imitation LearningMuJoCo | —Unverified | 0 |
| Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies | Feb 20, 2024 | Adversarial AttackMuJoCo | CodeCode Available | 0 |
| Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics | Feb 17, 2024 | MuJoCoRepresentation Learning | CodeCode Available | 0 |
| Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion over Diverse Terrains | Feb 9, 2024 | Depth EstimationMuJoCo | —Unverified | 0 |
| ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation | Feb 7, 2024 | MuJoCo | —Unverified | 0 |
| Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving | Feb 7, 2024 | Autonomous DrivingDeep Reinforcement Learning | —Unverified | 0 |
| Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference | Feb 7, 2024 | MuJoCo | CodeCode Available | 1 |
| Accelerating Inverse Reinforcement Learning with Expert Bootstrapping | Feb 4, 2024 | Imitation LearningMuJoCo | —Unverified | 0 |
| SQT -- std Q-target | Feb 3, 2024 | MuJoCoQ-Learning | —Unverified | 0 |
| MinMaxMin Q-learning | Feb 3, 2024 | MuJoCoQ-Learning | —Unverified | 0 |
| Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning | Feb 1, 2024 | Imitation LearningMuJoCo | CodeCode Available | 0 |
| A Reinforcement Learning Based Controller to Minimize Forces on the Crutches of a Lower-Limb Exoskeleton | Jan 31, 2024 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator | Jan 30, 2024 | Imitation LearningMuJoCo | —Unverified | 0 |
| Simple Policy Optimization | Jan 29, 2024 | MuJoCo | CodeCode Available | 2 |
| Episodic Reinforcement Learning with Expanded State-reward Space | Jan 19, 2024 | Autonomous DrivingDeep Reinforcement Learning | —Unverified | 0 |
| AgentMixer: Multi-Agent Correlated Policy Factorization | Jan 16, 2024 | Imitation LearningMuJoCo | —Unverified | 0 |
| Neural Population Learning beyond Symmetric Zero-sum Games | Jan 10, 2024 | MuJoCoTransfer Learning | —Unverified | 0 |
| An Invariant Information Geometric Method for High-Dimensional Online Optimization | Jan 3, 2024 | Bayesian OptimizationMuJoCo | CodeCode Available | 0 |
| Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction | Jan 2, 2024 | MuJoCoPolicy Gradient Methods | —Unverified | 0 |
| Adaptive trajectory-constrained exploration strategy for deep reinforcement learning | Dec 27, 2023 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Efficient Reinforcement Learning via Decoupling Exploration and Utilization | Dec 26, 2023 | Autonomous VehiclesMuJoCo | CodeCode Available | 1 |
| XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library | Dec 25, 2023 | CPUDeep Reinforcement Learning | CodeCode Available | 3 |
| DexDLO: Learning Goal-Conditioned Dexterous Policy for Dynamic Manipulation of Deformable Linear Objects | Dec 23, 2023 | MuJoCoPosition | —Unverified | 0 |
| OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments | Dec 19, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation | Dec 17, 2023 | Imitation LearningMuJoCo | CodeCode Available | 0 |
| Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation | Dec 15, 2023 | Data AugmentationMuJoCo | —Unverified | 0 |
| World Models via Policy-Guided Trajectory Diffusion | Dec 13, 2023 | continuous-controlContinuous Control | CodeCode Available | 1 |
| A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning | Dec 12, 2023 | MuJoCoOffline RL | CodeCode Available | 0 |
| A dynamical clipping approach with task feedback for Proximal Policy Optimization | Dec 12, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Similarity-based Knowledge Transfer for Cross-Domain Reinforcement Learning | Dec 5, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| Supported Trust Region Optimization for Offline Reinforcement Learning | Nov 15, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling | Nov 14, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning | Nov 11, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| Optimistic Multi-Agent Policy Gradient | Nov 3, 2023 | MuJoCoQ-Learning | CodeCode Available | 1 |
| Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula | Nov 3, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 |
| A Tractable Inference Perspective of Offline RL | Oct 31, 2023 | MuJoCoOffline RL | —Unverified | 0 |
| Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations | Oct 24, 2023 | Imitation LearningMuJoCo | —Unverified | 0 |
| Mind the Model, Not the Agent: The Primacy Bias in Model-based RL | Oct 23, 2023 | continuous-controlContinuous Control | —Unverified | 0 |
| Policy Gradient with Kernel Quadrature | Oct 23, 2023 | Causal DiscoveryMuJoCo | —Unverified | 0 |
| One is More: Diverse Perspectives within a Single Network for Efficient DRL | Oct 21, 2023 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | Oct 19, 2023 | MuJoCoPrompt Engineering | CodeCode Available | 1 |
| Benchmarking the Sim-to-Real Gap in Cloth Manipulation | Oct 14, 2023 | BenchmarkingMuJoCo | —Unverified | 0 |
| LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios | Oct 12, 2023 | Board GamesDecision Making | CodeCode Available | 0 |