SOTAVerified

OpenAI Gym

An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks.

(Description by Evolutionary learning of interpretable decision trees)

(Image Credit: OpenAI Gym)

Papers

Showing 201250 of 382 papers

TitleStatusHype
Noisy Spiking Actor Network for Exploration0
Non-Markovian Control with Gated End-to-End Memory Policy Networks0
Offline Inverse Reinforcement Learning0
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline0
Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error0
On Combining Expert Demonstrations in Imitation Learning via Optimal Transport0
Online Robust Policy Learning in the Presence of Unknown Adversaries0
Asymptotic Analysis of Sample-averaged Q-learning0
Optimism is All You Need: Model-Based Imitation Learning From Observation Alone0
Optimizing 2D+1 Packing in Constrained Environments Using Deep Reinforcement Learning0
Optimizing Sensor Redundancy in Sequential Decision-Making Problems0
Photonic Quantum Policy Learning in OpenAI Gym0
Policy Gradient using Weak Derivatives for Reinforcement Learning0
Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning0
Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation0
Proximal Policy Gradient: PPO with Policy Gradient0
Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution0
Decision-Making in Reinforcement Learning0
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network0
Quality Diversity Evolutionary Learning of Decision Trees0
Reward Prediction Error as an Exploration Objective in Deep RL0
RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning0
RangL: A Reinforcement Learning Competition Platform0
The Smart Buildings Control Suite: A Diverse Open Source Benchmark to Evaluate and Scale HVAC Control Policies for Sustainability0
Recommendation System-based Upper Confidence Bound for Online Advertising0
A Learning Approach to Robot-Agnostic Force-Guided High Precision Assembly0
WD3: Taming the Estimation Bias in Deep Reinforcement Learning0
Refined Continuous Control of DDPG Actors via Parametrised Activation0
REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using Reinforcement Learning Agents0
Reinforcement Learning Approach for Multi-Agent Flexible Scheduling Problems0
Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction0
Reinforcement Learning using Guided Observability0
Relative Importance Sampling for off-Policy Actor-Critic in Deep Reinforcement Learning0
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning0
Resilient Control of Networked Microgrids using Vertical Federated Reinforcement Learning: Designs and Real-Time Test-Bed Validations0
Rethinking Population-assisted Off-policy Reinforcement Learning0
Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations0
Sample-based Distributional Policy Gradient0
Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing0
Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research0
Sepsis World Model: A MIMIC-based OpenAI Gym "World Model" Simulator for Sepsis Treatment0
Sequential Learning of Movement Prediction in Dynamic Environments using LSTM Autoencoder0
Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning0
SIMILE: Introducing Sequential Information towards More Effective Imitation Learning0
Soft Actor-Critic with Inhibitory Networks for Faster Retraining0
State Distribution-aware Sampling for Deep Q-learning0
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning0
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning0
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation0
Structured Evolution with Compact Architectures for Scalable Policy Optimization0
Show:102550
← PrevPage 5 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,586.33Unverified
2TD3Average Return5,942.55Unverified
3SACAverage Return5,208.09Unverified
4DDPGAverage Return1,712.12Unverified
5PPOAverage Return608.97Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return15,836.04Unverified
2DDPGAverage Return14,934.86Unverified
3TD3Average Return12,026.73Unverified
4MEowAverage Return10,981.47Unverified
5PPOAverage Return6,006.11Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return3,332.99Unverified
2TD3Average Return3,319.98Unverified
3SACAverage Return2,882.56Unverified
4DDPGAverage Return1,290.24Unverified
5PPOAverage Return790.77Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,923.22Unverified
2SACAverage Return6,211.5Unverified
3PPOAverage Return925.89Unverified
4TD3Average Return198.44Unverified
5DDPGAverage Return139.14Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return5,745.27Unverified
2MEowAverage Return5,526.66Unverified
3DDPGAverage Return2,994.54Unverified
4PPOAverage Return2,739.81Unverified
5TD3Average Return2,612.74Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward5,163.54Unverified
2AWRMean Reward5,067Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return500Unverified
2Oblique decision treeAverage Return500Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,571.99Unverified
2AWRMean Reward9,136Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward3,458.22Unverified
2AWRMean Reward3,405Unverified
#ModelMetricClaimedVerifiedStatus
1Oblique decision treeAverage Return272.14Unverified
2AWRAverage Return229Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return-101.72Unverified
2Oblique decision treeAverage Return-106.02Unverified
#ModelMetricClaimedVerifiedStatus
1TLA with Hierarchical Reward FunctionsMean Reward-125.02Unverified
2TLAMean Reward-154.92Unverified
#ModelMetricClaimedVerifiedStatus
1AWRMean Reward5,813Unverified
2TLAMean Reward3,878.41Unverified
#ModelMetricClaimedVerifiedStatus
1AWRAverage Return4,996Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,356.67Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward1,000Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward93.88Unverified