SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 391400 of 15113 papers

TitleStatusHype
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
Deep Reinforcement Learning with Gradient Eligibility TracesCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
IRanker: Towards Ranking Foundation ModelCode1
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model LearningCode1
A Production Scheduling Framework for Reinforcement Learning Under Real-World ConstraintsCode1
Visual Pre-Training on Unlabeled Images using Reinforcement LearningCode1
RePO: Replay-Enhanced Policy OptimizationCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Show:102550
← PrevPage 40 of 1512Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified