SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 421430 of 15113 papers

TitleStatusHype
Structured Reinforcement Learning for Combinatorial Decision-MakingCode1
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited DataCode1
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMsCode1
Reinforcement Learning for Ballbot Navigation in Uneven TerrainCode1
The Cell Must Go On: Agar.io for Continual Reinforcement LearningCode1
Co-Reinforcement Learning for Unified Multimodal Understanding and GenerationCode1
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement LearningCode1
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward ModelsCode1
RLBenchNet: The Right Network for the Right Reinforcement Learning TaskCode1
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement LearningCode1
Show:102550
← PrevPage 43 of 1512Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified