SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 59766000 of 15113 papers

TitleStatusHype
Implicit Offline Reinforcement Learning via Supervised Learning0
Deep Reinforcement Learning for Inverse Inorganic Materials Design0
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes0
Fine-Grained Session Recommendations in E-commerce using Deep Reinforcement Learning0
Task Phasing: Automated Curriculum Learning from DemonstrationsCode0
Model-based Lifelong Reinforcement Learning with Bayesian ExplorationCode0
The Pump Scheduling Problem: A Real-World Scenario for Reinforcement LearningCode0
Safe Policy Improvement in Constrained Markov Decision Processes0
Robust Imitation via Mirror Descent Inverse Reinforcement Learning0
Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes0
Scaling Laws for Reward Model Overoptimization0
Palm up: Playing in the Latent Manifold for Unsupervised Pretraining0
Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation0
Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization0
Robot Navigation with Reinforcement Learned Path Generation and Fine-Tuned Motion Control0
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning0
When to Ask for Help: Proactive Interventions in Autonomous Reinforcement LearningCode0
On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness0
Learning Preferences for Interactive AutonomyCode0
Integrated Decision and Control for High-Level Automated Vehicles by Mixed Policy Gradient and Its Experiment Verification0
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design0
Hierarchical Reinforcement Learning for Furniture Layout in Virtual Indoor Scenes0
CLUTR: Curriculum Learning via Unsupervised Task Representation LearningCode0
CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with DemonstrationsCode0
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity0
Show:102550
← PrevPage 240 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified