SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 32513275 of 15113 papers

TitleStatusHype
QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust ControlCode2
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
Real-Time Network-Level Traffic Signal Control: An Explicit Multiagent Coordination Method0
Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) for comfortable and safe autonomous driving0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources0
A reinforcement learning strategy for p-adaptation in high order solvers0
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning0
Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning0
Unified Off-Policy Learning to Rank: a Reinforcement Learning PerspectiveCode0
Multi-market Energy Optimization with Renewables via Reinforcement Learning0
Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement LearningCode0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-SecondCode1
Kernelized Reinforcement Learning with Order Optimal Regret Bounds0
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning0
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning0
DenseLight: Efficient Control for Large-scale Traffic Signals with Dense FeedbackCode0
Online Prototype Alignment for Few-shot Policy TransferCode0
Robust Reinforcement Learning through Efficient Adversarial Herding0
Combining Reinforcement Learning and Barrier Functions for Adaptive Risk Management in Portfolio Optimization0
Diverse Projection Ensembles for Distributional Reinforcement Learning0
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles0
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds0
Show:102550
← PrevPage 131 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified