SOTAVerified

OpenAI Gym

An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks.

(Description by Evolutionary learning of interpretable decision trees)

(Image Credit: OpenAI Gym)

Papers

Showing 301350 of 382 papers

TitleStatusHype
Implementing Reinforcement Learning Algorithms in Retail Supply Chains with OpenAI Gym Toolkit0
Implicit Sensing in Traffic Optimization: Advanced Deep Reinforcement Learning Techniques0
Implicit Two-Tower Policies0
Improving Reinforcement Learning with Human Assistance: An Argument for Human Subject Studies with HIPPO Gym0
Influence-Based Reinforcement Learning for Intrinsically-Motivated Agents0
In Support of Over-Parametrization in Deep Reinforcement Learning: an Empirical Study0
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning0
Investigating Reinforcement Learning Agents for Continuous State Space Environments0
LagNetViP: A Lagrangian Neural Network for Video Prediction0
Multitask Neuroevolution for Reinforcement Learning with Long and Short Episodes0
Learn a Prior for RHEA for Better Online Planning0
Learning Environment Models with Continuous Stochastic Dynamics0
Learning from Demonstrations using Signal Temporal Logic0
Learning Gaussian Policies from Corrective Human Feedback0
Local Environment Poisoning Attacks on Federated Reinforcement Learning0
Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems0
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym0
Low-cost Real-world Implementation of the Swing-up Pendulum for Deep Reinforcement Learning Experiments0
Machine Learning aided Crop Yield Optimization0
MADRaS : Multi Agent Driving Simulator0
MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety0
MARTI-4: new model of human brain, considering neocortex and basal ganglia -- learns to play Atari game by reinforcement learning on a single CPU0
MDP Playground: Controlling Orthogonal Dimensions of Hardness in Toy Environments0
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn0
Model-based actor-critic: GAN (model generator) + DRL (actor-critic) => AGI0
Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees0
Modelling non-reinforced preferences using selective attention0
MoET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees0
MR-iNet Gym: Framework for Edge Deployment of Deep Reinforcement Learning on Embedded Software Defined Radio0
Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation0
MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems0
Compositional Q-learning for electrolyte repletion with imbalanced patient sub-populations0
Nested Policy Reinforcement Learning for Clinical Decision Support0
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks0
Neural Episodic Control with State Abstraction0
Neuron as an Agent0
Noisy Spiking Actor Network for Exploration0
Non-Markovian Control with Gated End-to-End Memory Policy Networks0
Offline Inverse Reinforcement Learning0
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline0
Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error0
On Combining Expert Demonstrations in Imitation Learning via Optimal Transport0
Online Robust Policy Learning in the Presence of Unknown Adversaries0
Asymptotic Analysis of Sample-averaged Q-learning0
Optimism is All You Need: Model-Based Imitation Learning From Observation Alone0
Optimizing 2D+1 Packing in Constrained Environments Using Deep Reinforcement Learning0
Optimizing Sensor Redundancy in Sequential Decision-Making Problems0
Photonic Quantum Policy Learning in OpenAI Gym0
Policy Gradient using Weak Derivatives for Reinforcement Learning0
Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning0
Show:102550
← PrevPage 7 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,586.33Unverified
2TD3Average Return5,942.55Unverified
3SACAverage Return5,208.09Unverified
4DDPGAverage Return1,712.12Unverified
5PPOAverage Return608.97Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return15,836.04Unverified
2DDPGAverage Return14,934.86Unverified
3TD3Average Return12,026.73Unverified
4MEowAverage Return10,981.47Unverified
5PPOAverage Return6,006.11Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return3,332.99Unverified
2TD3Average Return3,319.98Unverified
3SACAverage Return2,882.56Unverified
4DDPGAverage Return1,290.24Unverified
5PPOAverage Return790.77Unverified
#ModelMetricClaimedVerifiedStatus
1MEowAverage Return6,923.22Unverified
2SACAverage Return6,211.5Unverified
3PPOAverage Return925.89Unverified
4TD3Average Return198.44Unverified
5DDPGAverage Return139.14Unverified
#ModelMetricClaimedVerifiedStatus
1SACAverage Return5,745.27Unverified
2MEowAverage Return5,526.66Unverified
3DDPGAverage Return2,994.54Unverified
4PPOAverage Return2,739.81Unverified
5TD3Average Return2,612.74Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward5,163.54Unverified
2AWRMean Reward5,067Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return500Unverified
2Oblique decision treeAverage Return500Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,571.99Unverified
2AWRMean Reward9,136Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward3,458.22Unverified
2AWRMean Reward3,405Unverified
#ModelMetricClaimedVerifiedStatus
1Oblique decision treeAverage Return272.14Unverified
2AWRAverage Return229Unverified
#ModelMetricClaimedVerifiedStatus
1Orthogonal decision treeAverage Return-101.72Unverified
2Oblique decision treeAverage Return-106.02Unverified
#ModelMetricClaimedVerifiedStatus
1TLA with Hierarchical Reward FunctionsMean Reward-125.02Unverified
2TLAMean Reward-154.92Unverified
#ModelMetricClaimedVerifiedStatus
1AWRMean Reward5,813Unverified
2TLAMean Reward3,878.41Unverified
#ModelMetricClaimedVerifiedStatus
1AWRAverage Return4,996Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward9,356.67Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward1,000Unverified
#ModelMetricClaimedVerifiedStatus
1TLAMean Reward93.88Unverified