SOTAVerified

OpenAI Gym

An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks.

(Description by Evolutionary learning of interpretable decision trees)

(Image Credit: OpenAI Gym)

Papers

Showing 150 of 382 papers

TitleStatusHype
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn0
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI GymCode0
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation0
Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLMCode0
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks0
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-TuningCode0
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch SchedulingCode0
Optimizing 2D+1 Packing in Constrained Environments Using Deep Reinforcement Learning0
Low-cost Real-world Implementation of the Swing-up Pendulum for Deep Reinforcement Learning Experiments0
Value-Based Deep RL Scales Predictably0
Illuminating Spaces: Deep Reinforcement Learning and Laser-Wall Partitioning for Architectural Layout Generation0
Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning0
Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations0
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning0
A quantum-classical reinforcement learning model to play Atari gamesCode0
Optimizing Sensor Redundancy in Sequential Decision-Making Problems0
Creating Hierarchical Dispositions of Needs in an AgentCode0
A Multi-Agent Reinforcement Learning Testbed for Cognitive Radio Applications0
Asymptotic Analysis of Sample-averaged Q-learning0
The Smart Buildings Control Suite: A Diverse Open Source Benchmark to Evaluate and Scale HVAC Control Policies for Sustainability0
MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety0
Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement LearningCode0
HistoGym: A Reinforcement Learning Environment for Histopathological Image AnalysisCode0
Adaptive Planning with Generative Models under Uncertainty0
Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms0
A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning0
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement LearningCode1
OMPO: A Unified Framework for RL under Policy and Dynamics ShiftsCode1
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing FlowCode1
Traffic control using intelligent timing of traffic lights with reinforcement learning technique and real-time processing of surveillance camera images0
Decision Mamba ArchitecturesCode0
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory SystemsCode0
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline0
Airlift Challenge: A Competition for Optimizing Cargo Delivery0
Enhancing Privacy and Security of Autonomous UAV Navigation0
HomeLabGym: A real-world testbed for home energy management systems0
Noisy Spiking Actor Network for Exploration0
QF-tuner: Breaking Tradition in Reinforcement Learning0
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic SpacesCode0
Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization0
Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research0
MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems0
Decision Making in Non-Stationary Environments with Policy-Augmented SearchCode0
A Closed-Loop Multi-perspective Visual Servoing Approach with Reinforcement Learning0
RFRL Gym: A Reinforcement Learning Testbed for Cognitive Radio ApplicationsCode1
Investigating the Performance and Reliability, of the Q-Learning Algorithm in Various Unknown EnvironmentsCode0
Peer Learning: Learning Complex Policies in Groups from Scratch via Action RecommendationsCode1
LLF-Bench: Benchmark for Interactive Learning from Language FeedbackCode1
Efficient Parallel Reinforcement Learning Framework using the Reactor ModelCode0
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,586.33Unverified
2TD3Average Return5,942.55Unverified
3SACAverage Return5,208.09Unverified
4DDPGAverage Return1,712.12Unverified
5PPOAverage Return608.97Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return15,836.04Unverified
2DDPGAverage Return14,934.86Unverified
3TD3Average Return12,026.73Unverified
4MEowAverage Return10,981.47Unverified
5PPOAverage Return6,006.11Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return3,332.99Unverified
2TD3Average Return3,319.98Unverified
3SACAverage Return2,882.56Unverified
4DDPGAverage Return1,290.24Unverified
5PPOAverage Return790.77Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,923.22Unverified
2SACAverage Return6,211.5Unverified
3PPOAverage Return925.89Unverified
4TD3Average Return198.44Unverified
5DDPGAverage Return139.14Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return5,745.27Unverified
2MEowAverage Return5,526.66Unverified
3DDPGAverage Return2,994.54Unverified
4PPOAverage Return2,739.81Unverified
5TD3Average Return2,612.74Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward5,163.54Unverified
2AWRMean Reward5,067Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return500Unverified
2Oblique decision treeAverage Return500Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,571.99Unverified
2AWRMean Reward9,136Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward3,458.22Unverified
2AWRMean Reward3,405Unverified
#ModelMetricClaimedVerifiedStatus
1Oblique decision treeAverage Return272.14Unverified
2AWRAverage Return229Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return-101.72Unverified
2Oblique decision treeAverage Return-106.02Unverified
#ModelMetricClaimedVerifiedStatus
1TLA with Hierarchical Reward FunctionsMean Reward-125.02Unverified
2TLAMean Reward-154.92Unverified
#ModelMetricClaimedVerifiedStatus
1AWRMean Reward5,813Unverified
2TLAMean Reward3,878.41Unverified
#ModelMetricClaimedVerifiedStatus
1AWRAverage Return4,996Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,356.67Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward1,000Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward93.88Unverified