| Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees | Oct 4, 2022 | Imitation LearningMuJoCo | —Unverified | 0 | 0 |
| Supported Trust Region Optimization for Offline Reinforcement Learning | Nov 15, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network | May 26, 2025 | Evolutionary AlgorithmsMuJoCo | —Unverified | 0 | 0 |
| Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning | Mar 12, 2024 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Temporal Abstraction in Reinforcement Learning with Offline Data | Jul 21, 2024 | Hierarchical Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Temporal-adaptive Hierarchical Reinforcement Learning | Feb 6, 2020 | Atari GamesHierarchical Reinforcement Learning | —Unverified | 0 | 0 |
| MinMaxMin Q-learning | Feb 3, 2024 | MuJoCoQ-Learning | —Unverified | 0 | 0 |
| SQT -- std Q-target | Feb 3, 2024 | MuJoCoQ-Learning | —Unverified | 0 | 0 |
| Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision | Apr 21, 2025 | MuJoCoZero-shot Generalization | —Unverified | 0 | 0 |
| The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning | Jun 16, 2025 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective | Aug 19, 2024 | MuJoCo | —Unverified | 0 | 0 |
| The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously | Jul 11, 2017 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting | Mar 2, 2023 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning | Jul 24, 2023 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation | Oct 9, 2021 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Mind the Model, Not the Agent: The Primacy Bias in Model-based RL | Oct 23, 2023 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Time-Efficient Reward Learning via Visually Assisted Cluster Ranking | Nov 30, 2022 | Dimensionality ReductionMuJoCo | —Unverified | 0 | 0 |
| TIMRL: A Novel Meta-Reinforcement Learning Framework for Non-Stationary and Multi-Task Environments | Jan 13, 2025 | Decision MakingMeta Reinforcement Learning | —Unverified | 0 | 0 |
| TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching | May 22, 2023 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence | Jan 24, 2022 | MuJoCo | —Unverified | 0 | 0 |
| Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control | May 1, 2020 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Towards Characterizing Divergence in Deep Q-Learning | Mar 21, 2019 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning | Sep 25, 2019 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble | Jun 1, 2022 | Imitation LearningMuJoCo | —Unverified | 0 | 0 |
| Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning | Jan 31, 2019 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound | Jul 15, 2025 | counterfactualDecision Making | —Unverified | 0 | 0 |
| Uncertainty-aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning | Nov 19, 2021 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Understanding the Asymptotic Performance of Model-Based RL Methods | Sep 27, 2018 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games | Aug 19, 2022 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Universal Successor Features for Transfer Reinforcement Learning | Jan 5, 2020 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| Unsupervised Discovery of Continuous Skills on a Sphere | May 21, 2023 | MuJoCoUnsupervised Reinforcement Learning | —Unverified | 0 | 0 |
| User-Oriented Robust Reinforcement Learning | Feb 15, 2022 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| Value Improved Actor Critic Algorithms | Jun 3, 2024 | MuJoCo | —Unverified | 0 | 0 |
| Value Summation: A Novel Scoring Function for MPC-based Model-based Reinforcement Learning | Sep 16, 2022 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Variance Reduction for Reinforcement Learning in Input-Driven Environments | Jul 6, 2018 | Meta-LearningMuJoCo | —Unverified | 0 | 0 |
| Variational OOD State Correction for Offline Reinforcement Learning | May 1, 2025 | Decision MakingMuJoCo | —Unverified | 0 | 0 |
| V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects | Nov 7, 2021 | MuJoCoObject | —Unverified | 0 | 0 |
| Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control | Mar 4, 2023 | MuJoCoQ-Learning | —Unverified | 0 | 0 |
| Wasserstein Unsupervised Reinforcement Learning | Oct 15, 2021 | Hierarchical Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Weighted Entropy Modification for Soft Actor-Critic | Nov 18, 2020 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| What About Taking Policy as Input of Value Function: Policy-extended Value Function Approximator | Sep 28, 2020 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Provably Robust Blackbox Optimization for Reinforcement Learning | Mar 7, 2019 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning | Sep 8, 2021 | Adversarial Attackcontinuous-control | —Unverified | 0 | 0 |
| Yes, Q-learning Helps Offline In-Context RL | Feb 24, 2025 | In-Context Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning | May 14, 2020 | Adversarial AttackDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Inverse Reinforcement Learning with the Average Reward Criterion | May 24, 2023 | MuJoCoreinforcement-learning | —Unverified | 0 | 0 |
| SelfBC: Self Behavior Cloning for Offline Reinforcement Learning | Aug 4, 2024 | AttributeD4RL | —Unverified | 0 | 0 |
| SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning | Mar 3, 2025 | MuJoCoMulti-agent Reinforcement Learning | —Unverified | 0 | 0 |
| Modular Recurrence in Contextual MDPs for Universal Morphology Control | Jun 10, 2025 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 | 0 |
| Wasserstein Barycenter Soft Actor-Critic | Jun 11, 2025 | continuous-controlContinuous Control | —Unverified | 0 | 0 |