| Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation | Oct 19, 2022 | D4RLMuJoCo | CodeCode Available | 0 |
| Simple Emergent Action Representations from Multi-Task Policy Training | Oct 18, 2022 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments | Oct 14, 2022 | Atari GamesBenchmarking | CodeCode Available | 1 |
| Policy Gradient With Serial Markov Chain Reasoning | Oct 13, 2022 | Decision MakingMuJoCo | —Unverified | 0 |
| Mind's Eye: Grounded Language Model Reasoning through Simulation | Oct 11, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees | Oct 4, 2022 | counterfactualImitation Learning | —Unverified | 0 |
| Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization | Oct 4, 2022 | Bayesian OptimizationMuJoCo | CodeCode Available | 1 |
| Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees | Oct 4, 2022 | Imitation LearningMuJoCo | —Unverified | 0 |
| Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States | Oct 1, 2022 | continuous-controlContinuous Control | —Unverified | 0 |
| On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies | Sep 21, 2022 | continuous-controlContinuous Control | —Unverified | 0 |
| A Computational Model of Learning Flexible Navigation in a Maze by Layout-Conforming Replay of Place Cells | Sep 18, 2022 | MuJoCo | —Unverified | 0 |
| Value Summation: A Novel Scoring Function for MPC-based Model-based Reinforcement Learning | Sep 16, 2022 | Model-based Reinforcement LearningMuJoCo | —Unverified | 0 |
| Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations | Sep 16, 2022 | Decision MakingImitation Learning | —Unverified | 0 |
| On the Reuse Bias in Off-Policy Reinforcement Learning | Sep 15, 2022 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction | Sep 2, 2022 | MuJoCoMulti-agent Reinforcement Learning | —Unverified | 0 |
| Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization | Sep 1, 2022 | Bayesian InferenceKnowledge Distillation | —Unverified | 0 |
| Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking | Aug 22, 2022 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| Entropy Augmented Reinforcement Learning | Aug 19, 2022 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games | Aug 19, 2022 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| A Game-Theoretic Perspective of Generalization in Reinforcement Learning | Aug 7, 2022 | Few-Shot LearningMeta-Learning | —Unverified | 0 |
| Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts | Aug 4, 2022 | Generative Adversarial NetworkModel-based Reinforcement Learning | —Unverified | 0 |
| Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL | Aug 2, 2022 | MuJoCoMulti-agent Reinforcement Learning | —Unverified | 0 |
| Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization | Jul 29, 2022 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Learning Bipedal Walking On Planned Footsteps For Humanoid Robots | Jul 26, 2022 | Deep Reinforcement LearningMuJoCo | CodeCode Available | 3 |
| Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy | Jul 25, 2022 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction | Jul 20, 2022 | Imitation LearningMuJoCo | —Unverified | 0 |
| Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments | Jul 19, 2022 | MuJoCoreinforcement-learning | —Unverified | 0 |
| Short-Term Plasticity Neurons Learning to Learn and Forget | Jun 28, 2022 | MuJoCoReinforcement Learning (RL) | CodeCode Available | 1 |
| Prompting Decision Transformer for Few-Shot Policy Generalization | Jun 27, 2022 | Few-Shot LearningInductive Bias | —Unverified | 0 |
| CGAR: Critic Guided Action Redistribution in Reinforcement Leaning | Jun 23, 2022 | MuJoCoReinforcement Learning (RL) | CodeCode Available | 0 |
| Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming | Jun 22, 2022 | Autonomous DrivingClassification | —Unverified | 0 |
| EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine | Jun 21, 2022 | MuJoCoreinforcement-learning | CodeCode Available | 5 |
| Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis | Jun 17, 2022 | MuJoCoStarcraft | —Unverified | 0 |
| Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning | Jun 15, 2022 | Autonomous Drivingcontinuous-control | —Unverified | 0 |
| Relative Policy-Transition Optimization for Fast Policy Transfer | Jun 13, 2022 | continuous-controlContinuous Control | —Unverified | 0 |
| Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies | Jun 12, 2022 | continuous-controlContinuous Control | —Unverified | 0 |
| Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk | Jun 9, 2022 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning | Jun 4, 2022 | MuJoCoOff-policy evaluation | —Unverified | 0 |
| Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble | Jun 1, 2022 | Imitation LearningMuJoCo | —Unverified | 0 |
| Multi-Object Grasping in the Plane | Jun 1, 2022 | MuJoCoObject | —Unverified | 0 |
| TaSIL: Taylor Series Imitation Learning | May 30, 2022 | continuous-controlContinuous Control | CodeCode Available | 0 |
| SEREN: Knowing When to Explore and When to Exploit | May 30, 2022 | MuJoCoReinforcement Learning (RL) | —Unverified | 0 |
| Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning | May 30, 2022 | Data PoisoningDeep Reinforcement Learning | CodeCode Available | 0 |
| Multi-Agent Reinforcement Learning is a Sequence Modeling Problem | May 30, 2022 | Decision MakingMuJoCo | CodeCode Available | 2 |
| ARLO: A Framework for Automated Reinforcement Learning | May 20, 2022 | feature selectionMuJoCo | CodeCode Available | 1 |
| Data Valuation for Offline Reinforcement Learning | May 19, 2022 | Data ValuationDeep Reinforcement Learning | —Unverified | 0 |
| Imitation Learning from Observations under Transition Model Disparity | Apr 25, 2022 | Imitation Learningmodel | CodeCode Available | 0 |
| A Computational Theory of Learning Flexible Reward-Seeking Behavior with Place Cells | Apr 22, 2022 | MuJoCoOpen-Ended Question Answering | —Unverified | 0 |
| JORLDY: a fully customizable open source framework for reinforcement learning | Apr 11, 2022 | MuJoCoOpenAI Gym | CodeCode Available | 2 |
| Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization | Apr 4, 2022 | continuous-controlContinuous Control | —Unverified | 0 |