SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 40514075 of 15113 papers

TitleStatusHype
Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting BehaviorCode0
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction ApproachCode0
RL^2: Fast Reinforcement Learning via Slow Reinforcement LearningCode0
Off-Policy Correction For Multi-Agent Reinforcement LearningCode0
The Role of Deep Learning Regularizations on Actors in Offline RLCode0
Multi-task Maximum Entropy Inverse Reinforcement LearningCode0
Partially Observable Residual Reinforcement Learning for PV-Inverter-Based Voltage Control in Distribution GridsCode0
Modular Multi-Objective Deep Reinforcement Learning with Decision ValuesCode0
RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoTCode0
The State of Sparse Training in Deep Reinforcement LearningCode0
Off-Policy Deep Reinforcement Learning with Analogous Disentangled ExplorationCode0
Multitask radiological modality invariant landmark localization using deep reinforcement learningCode0
Sim-Anchored Learning for On-the-Fly AdaptationCode0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in VideosCode0
RLCard: A Toolkit for Reinforcement Learning in Card GamesCode0
The Value of Planning for Infinite-Horizon Model Predictive ControlCode0
Modular Multitask Reinforcement Learning with Policy SketchesCode0
Solving the Real Robot Challenge using Deep Reinforcement LearningCode0
Modular Networks Prevent Catastrophic Interference in Model-Based Multi-Task Reinforcement LearningCode0
Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive RewardsCode0
Think-J: Learning to Think for Generative LLM-as-a-JudgeCode0
Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and DefensesCode0
Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement LearningCode0
Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement LearningCode0
Show:102550
← PrevPage 163 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified