| A general class of surrogate functions for stable and efficient reinforcement learning | Aug 12, 2021 | MuJoCoPolicy Gradient Methods | CodeCode Available | 0 |
| Regret Minimization Experience Replay in Off-Policy Reinforcement Learning | May 15, 2021 | MuJoCoreinforcement-learning | CodeCode Available | 0 |
| Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning | Sep 7, 2019 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Adaptive trajectory-constrained exploration strategy for deep reinforcement learning | Dec 27, 2023 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| ToriLLE: Learning Environment for Hand-to-Hand Combat | Jul 26, 2018 | BIG-bench Machine LearningMuJoCo | CodeCode Available | 0 |
| Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning | May 30, 2022 | Data PoisoningDeep Reinforcement Learning | CodeCode Available | 0 |
| A dynamical clipping approach with task feedback for Proximal Policy Optimization | Dec 12, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |