SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1295113000 of 15113 papers

TitleStatusHype
The Atari Grand Challenge DatasetCode0
Skill Machines: Temporal Logic Skill Composition in Reinforcement LearningCode0
Reward-Machine-Guided, Self-Paced Reinforcement LearningCode0
Variance Networks: When Expectation Does Not Meet Your ExpectationsCode0
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial ObservationsCode0
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPUCode0
The Benefits of Model-Based Generalization in Reinforcement LearningCode0
MDP Playground: An Analysis and Debug Testbed for Reinforcement LearningCode0
Meta-Reinforcement Learning for Reliable Communication in THz/VLC Wireless VR NetworksCode0
Modular Multitask Reinforcement Learning with Policy SketchesCode0
Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement LearningCode0
Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement LearningCode0
Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI TeammatesCode0
MDPGT: Momentum-based Decentralized Policy Gradient TrackingCode0
Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple ReuseCode0
Unifying Count-Based Exploration and Intrinsic MotivationCode0
SliceIt! -- A Dual Simulator Framework for Learning Robot Food SlicingCode0
Posterior Sampling for Reinforcement Learning Without EpisodesCode0
Variance Reduction based Experience Replay for Policy OptimizationCode0
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement LearningCode0
The Chef's Hat Simulation Environment for Reinforcement-Learning-Based AgentsCode0
Reinforcement Learning of Risk-Constrained Policies in Markov Decision ProcessesCode0
Posterior-regularized REINFORCE for Instance Selection in Distant SupervisionCode0
Modular Multi-Objective Deep Reinforcement Learning with Decision ValuesCode0
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided DistillationCode0
Unifying Interpretability and Explainability for Alzheimer's Disease Progression PredictionCode0
Reinforcement Learning of Musculoskeletal Control from Functional SimulationsCode0
Reward-Weighted Regression Converges to a Global OptimumCode0
Reinforcement Learning of Active Vision for Manipulating Objects under OcclusionsCode0
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement LearningCode0
The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learningCode0
Reinforcement Learning Neural Turing Machines - RevisedCode0
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from ObservationsCode0
Smart Imitator: Learning from Imperfect Clinical DecisionsCode0
Reinforcement Learning In Two Player Zero Sum Simultaneous Action GamesCode0
MDP environments for the OpenAI GymCode0
Smart Magnetic Microrobots Learn to Swim with Deep Reinforcement LearningCode0
Towards Understanding the Link Between Modularity and Performance in Neural Networks for Reinforcement LearningCode0
Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement LearningCode0
ToyBox: Better Atari Environments for Testing Reinforcement Learning AgentsCode0
Post Reinforcement Learning InferenceCode0
Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative ControlCode0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
The Distributional Reward Critic Framework for Reinforcement Learning Under Perturbed RewardsCode0
SME-Net: Sparse Motion Estimation for Parametric Video Prediction Through Reinforcement LearningCode0
SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional PoliciesCode0
Risk-Aware Active Inverse Reinforcement LearningCode0
SMiRL: Surprise Minimizing Reinforcement Learning in Unstable EnvironmentsCode0
SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement LearningCode0
Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous DrivingCode0
Show:102550
← PrevPage 260 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified