| Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation | Oct 19, 2022 | D4RLMuJoCo | —Unverified | 0 |
| Simple Emergent Action Representations from Multi-Task Policy Training | Oct 18, 2022 | Deep Reinforcement LearningMuJoCo | —Unverified | 0 |
| WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments | Oct 14, 2022 | Atari GamesBenchmarking | CodeCode Available | 1 |
| Policy Gradient With Serial Markov Chain Reasoning | Oct 13, 2022 | Decision MakingMuJoCo | —Unverified | 0 |
| Mind's Eye: Grounded Language Model Reasoning through Simulation | Oct 11, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees | Oct 4, 2022 | counterfactualImitation Learning | —Unverified | 0 |
| Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization | Oct 4, 2022 | Bayesian OptimizationMuJoCo | CodeCode Available | 1 |
| Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees | Oct 4, 2022 | Imitation LearningMuJoCo | —Unverified | 0 |
| Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States | Oct 1, 2022 | continuous-controlContinuous Control | —Unverified | 0 |
| On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies | Sep 21, 2022 | continuous-controlContinuous Control | —Unverified | 0 |