SOTAVerified

OpenAI Gym

An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks.

(Description by Evolutionary learning of interpretable decision trees)

(Image Credit: OpenAI Gym)

Papers

Showing 51100 of 382 papers

TitleStatusHype
Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning AlgorithmsCode1
Decision Transformer: Reinforcement Learning via Sequence ModelingCode1
Ecole: A Gym-like Library for Machine Learning in Combinatorial Optimization SolversCode1
Reinforcement Learning for Control of ValvesCode1
Implicit Distributional Reinforcement LearningCode1
Dynamic Sparse Training for Deep Reinforcement LearningCode1
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement LearningCode1
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless SystemsCode1
Blue River Controls: A toolkit for Reinforcement Learning Control Systems on HardwareCode1
CaiRL: A High-Performance Reinforcement Learning Environment ToolkitCode1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement LearningCode1
ABIDES-Gym: Gym Environments for Multi-Agent Discrete Event Simulation and Application to Financial MarketsCode1
Experience Replay with Likelihood-free Importance WeightsCode1
EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological ModelsCode1
NavRep: Unsupervised Representations for Reinforcement Learning of Robot Navigation in Dynamic Human EnvironmentsCode1
For SALE: State-Action Representation Learning for Deep Reinforcement LearningCode1
CityLearn: Standardizing Research in Multi-Agent Reinforcement Learning for Demand Response and Urban Energy ManagementCode1
Improving Model-Based Reinforcement Learning with Internal State Representations through Self-SupervisionCode1
Addressing Function Approximation Error in Actor-Critic MethodsCode1
Towards Real-World Deployment of Reinforcement Learning for Traffic Signal ControlCode1
LongiControl: A Reinforcement Learning Environment for Longitudinal Vehicle ControlCode1
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement LearningCode1
CompilerGym: Robust, Performant Compiler Optimization Environments for AI ResearchCode1
An Open-Source Multi-Goal Reinforcement Learning Environment for Robotic Manipulation with PybulletCode1
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic ActorCode1
MoËT: Mixture of Expert Trees and its Application to Verifiable Reinforcement LearningCode1
Adaptive Droplet Routing in Digital Microfluidic Biochips Using Deep Reinforcement Learning0
myGym: Modular Toolkit for Visuomotor Robotic Tasks0
BlockPuzzle - A Challenge in Physical Reasoning and Generalization for Robot Learning0
Adaptive Experience Selection for Policy Gradient0
Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization0
DQN with model-based exploration: efficient learning on environments with sparse rewards0
Airlift Challenge: A Competition for Optimizing Cargo Delivery0
DriverGym: Democratising Reinforcement Learning for Autonomous Driving0
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization0
A Generalised Inverse Reinforcement Learning Framework0
EasyRL: A Simple and Extensible Reinforcement Learning Framework0
Behavior Cloning in OpenAI using Case Based Reasoning0
Affine Transport for Sim-to-Real Domain Adaptation0
Active Inference in Hebbian Learning Networks0
Balancing a CartPole System with Reinforcement Learning -- A Tutorial0
AWD3: Dynamic Reduction of the Estimation Bias0
Adversarial joint attacks on legged robots0
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks0
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints0
Adversarial Exploration Strategy for Self-Supervised Imitation Learning0
Distilling Deep RL Models Into Interpretable Neuro-Fuzzy Systems0
Attention Loss Adjusted Prioritized Experience Replay0
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,586.33Unverified
2TD3Average Return5,942.55Unverified
3SACAverage Return5,208.09Unverified
4DDPGAverage Return1,712.12Unverified
5PPOAverage Return608.97Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return15,836.04Unverified
2DDPGAverage Return14,934.86Unverified
3TD3Average Return12,026.73Unverified
4MEowAverage Return10,981.47Unverified
5PPOAverage Return6,006.11Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return3,332.99Unverified
2TD3Average Return3,319.98Unverified
3SACAverage Return2,882.56Unverified
4DDPGAverage Return1,290.24Unverified
5PPOAverage Return790.77Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,923.22Unverified
2SACAverage Return6,211.5Unverified
3PPOAverage Return925.89Unverified
4TD3Average Return198.44Unverified
5DDPGAverage Return139.14Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return5,745.27Unverified
2MEowAverage Return5,526.66Unverified
3DDPGAverage Return2,994.54Unverified
4PPOAverage Return2,739.81Unverified
5TD3Average Return2,612.74Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward5,163.54Unverified
2AWRMean Reward5,067Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return500Unverified
2Oblique decision treeAverage Return500Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,571.99Unverified
2AWRMean Reward9,136Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward3,458.22Unverified
2AWRMean Reward3,405Unverified
#ModelMetricClaimedVerifiedStatus
1Oblique decision treeAverage Return272.14Unverified
2AWRAverage Return229Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return-101.72Unverified
2Oblique decision treeAverage Return-106.02Unverified
#ModelMetricClaimedVerifiedStatus
1TLA with Hierarchical Reward FunctionsMean Reward-125.02Unverified
2TLAMean Reward-154.92Unverified
#ModelMetricClaimedVerifiedStatus
1AWRMean Reward5,813Unverified
2TLAMean Reward3,878.41Unverified
#ModelMetricClaimedVerifiedStatus
1AWRAverage Return4,996Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,356.67Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward1,000Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward93.88Unverified