SOTAVerified

OpenAI Gym

An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks.

(Description by Evolutionary learning of interpretable decision trees)

(Image Credit: OpenAI Gym)

Papers

Showing 51100 of 382 papers

TitleStatusHype
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
Resilient Control of Networked Microgrids using Vertical Federated Reinforcement Learning: Designs and Real-Time Test-Bed Validations0
Guaranteeing Control Requirements via Reward Shaping in Reinforcement LearningCode0
Bridging Dimensions: Confident Reachability for High-Dimensional ControllersCode0
Repairing Learning-Enabled Controllers While Preserving What WorksCode0
SDGym: Low-Code Reinforcement Learning Environments using System Dynamics ModelsCode0
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration BiasCode1
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks0
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym0
Implicit Sensing in Traffic Optimization: Advanced Deep Reinforcement Learning Techniques0
gym-saturation: Gymnasium environments for saturation provers (System description)0
Attention Loss Adjusted Prioritized Experience Replay0
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning0
Distributionally Robust Statistical Verification with Imprecise Neural Networks0
qgym: A Gym for Training and Benchmarking RL-Based Quantum CompilationCode1
On Combining Expert Demonstrations in Imitation Learning via Optimal Transport0
Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing0
Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement LearningCode0
Learning Environment Models with Continuous Stochastic Dynamics0
Correcting discount-factor mismatch in on-policy policy gradient methods0
Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy ImitationCode0
Deep Reinforcement Learning for ESG financial portfolio management0
Mimicking Better by Matching the Approximate Action DistributionCode0
Active Inference in Hebbian Learning Networks0
Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous DrivingCode0
For SALE: State-Action Representation Learning for Deep Reinforcement LearningCode1
Optimizing Attention and Cognitive Control Costs Using Temporally-Layered ArchitecturesCode0
Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning0
Rethinking Population-assisted Off-policy Reinforcement Learning0
Gym-preCICE: Reinforcement Learning Environments for Active Flow Control0
Signal Novelty Detection as an Intrinsic Reward for RoboticsCode0
Exact and Cost-Effective Automated Transformation of Neural Network Controllers to Decision Tree Controllers0
Causal Repair of Learning-enabled Cyber-physical Systems0
Generative Adversarial Neuroevolution for Control Behaviour ImitationCode0
Neuroevolution of Recurrent Architectures on Control TasksCode0
Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse LearningCode0
Graph Decision Transformer0
A Strategy-Oriented Bayesian Soft Actor-Critic Model0
Local Environment Poisoning Attacks on Federated Reinforcement Learning0
Double A3C: Deep Reinforcement Learning on OpenAI Gym Games0
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints0
EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary ComputationCode4
Neural Episodic Control with State Abstraction0
PushWorld: A benchmark for manipulation planning with tools and movable obstaclesCode1
Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets0
Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error0
Enhancing Cyber Resilience of Networked Microgrids using Vertical Federated Reinforcement Learning0
Robust Policy Optimization in Deep Reinforcement LearningCode0
CT-DQN: Control-Tutored Deep Reinforcement Learning0
MO-Gym: A Library of Multi-Objective Reinforcement Learning EnvironmentsCode2
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,586.33Unverified
2TD3Average Return5,942.55Unverified
3SACAverage Return5,208.09Unverified
4DDPGAverage Return1,712.12Unverified
5PPOAverage Return608.97Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return15,836.04Unverified
2DDPGAverage Return14,934.86Unverified
3TD3Average Return12,026.73Unverified
4MEowAverage Return10,981.47Unverified
5PPOAverage Return6,006.11Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return3,332.99Unverified
2TD3Average Return3,319.98Unverified
3SACAverage Return2,882.56Unverified
4DDPGAverage Return1,290.24Unverified
5PPOAverage Return790.77Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,923.22Unverified
2SACAverage Return6,211.5Unverified
3PPOAverage Return925.89Unverified
4TD3Average Return198.44Unverified
5DDPGAverage Return139.14Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return5,745.27Unverified
2MEowAverage Return5,526.66Unverified
3DDPGAverage Return2,994.54Unverified
4PPOAverage Return2,739.81Unverified
5TD3Average Return2,612.74Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward5,163.54Unverified
2AWRMean Reward5,067Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return500Unverified
2Oblique decision treeAverage Return500Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,571.99Unverified
2AWRMean Reward9,136Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward3,458.22Unverified
2AWRMean Reward3,405Unverified
#ModelMetricClaimedVerifiedStatus
1Oblique decision treeAverage Return272.14Unverified
2AWRAverage Return229Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return-101.72Unverified
2Oblique decision treeAverage Return-106.02Unverified
#ModelMetricClaimedVerifiedStatus
1TLA with Hierarchical Reward FunctionsMean Reward-125.02Unverified
2TLAMean Reward-154.92Unverified
#ModelMetricClaimedVerifiedStatus
1AWRMean Reward5,813Unverified
2TLAMean Reward3,878.41Unverified
#ModelMetricClaimedVerifiedStatus
1AWRAverage Return4,996Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,356.67Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward1,000Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward93.88Unverified