SOTAVerified|Agents Browse Leaderboard About Blog

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2511–2520 of 15113 papers

Title	Date	Tasks	Status
Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning	May 22, 2025	Mathematical ReasoningReinforcement Learning (RL)	—Unverified
PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects	May 22, 2025	Offline RLReinforcement Learning (RL)	CodeCode Available
Strategically Linked Decisions in Long-Term Planning and Reinforcement Learning	May 22, 2025	Reinforcement Learning (RL)	—Unverified
SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available
VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL	May 21, 2025	Reinforcement Learning (RL)	—Unverified
Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives	May 21, 2025	Reinforcement Learning (RL)	—Unverified
Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One	May 21, 2025	Model SelectionReinforcement Learning (RL)	—Unverified
Reward Is Enough: LLMs Are In-Context Reinforcement Learners	May 21, 2025	Large Language ModelReinforcement Learning (RL)	—Unverified
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning	May 21, 2025	Reinforcement Learning (RL)Visual Reasoning	—Unverified
Learning-based Autonomous Oversteer Control and Collision Avoidance	May 21, 2025	Autonomous DrivingCollision Avoidance	—Unverified

Show:10 25 50

← PrevPage 252 of 1512Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified