SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 72767300 of 15113 papers

TitleStatusHype
What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret0
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning0
What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning0
(When) Are Contrastive Explanations of Reinforcement Learning Helpful?0
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey0
When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning0
When Collaborative Filtering Meets Reinforcement Learning0
When Do Drivers Concentrate? Attention-based Driver Behavior Modeling With Deep Reinforcement Learning0
When is Agnostic Reinforcement Learning Statistically Tractable?0
When is a Prediction Knowledge?0
When Is Generalizable Reinforcement Learning Tractable?0
When is Offline Two-Player Zero-Sum Markov Game Solvable?0
When Is Partially Observable Reinforcement Learning Not Scary?0
When is Realizability Sufficient for Off-Policy Reinforcement Learning?0
When Learning Is Out of Reach, Reset: Generalization in Autonomous Visuomotor Reinforcement Learning0
When Mining Electric Locomotives Meet Reinforcement Learning0
When Multiple Agents Learn to Schedule: A Distributed Radio Resource Management Framework0
Provably Robust Blackbox Optimization for Reinforcement Learning0
When should agents explore?0
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?0
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms0
When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation0
When to Localize? A Risk-Constrained Reinforcement Learning Approach0
When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter0
Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning0
Show:102550
← PrevPage 292 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified