SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 22712280 of 15113 papers

TitleStatusHype
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training0
Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning0
High-Throughput Distributed Reinforcement Learning via Adaptive Policy SynchronizationCode0
Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction0
Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light0
Personalized Exercise Recommendation with Semantically-Grounded Knowledge TracingCode0
Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across ModalitiesCode0
Exploring the robustness of TractOracle methods in RL-based tractographyCode0
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs0
Scaling RL to Long Videos0
Show:102550
← PrevPage 228 of 1512Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified