SOTAVerified|Agents Browse Leaderboard About

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 731–740 of 15113 papers

Title	Date	Tasks	Status	Hype
Grammar and Gameplay-aligned RL for Game Description Generation with LLMs	Mar 20, 2025	reinforcement-learningReinforcement Learning	—Unverified	0
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning	Mar 20, 2025	Reinforcement Learning (RL)	—Unverified	0
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning	Mar 20, 2025	ClassificationFew-Shot Learning	CodeCode Available	2
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Mar 20, 2025	BenchmarkingReinforcement Learning (RL)	CodeCode Available	4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning	Mar 20, 2025	Decision MakingLanguage Modeling	CodeCode Available	4
UAS Visual Navigation in Large and Unseen Environments via a Meta Agent	Mar 20, 2025	Incremental LearningMeta Reinforcement Learning	—Unverified	0
Comprehensive Review of Reinforcement Learning for Medical Ultrasound Imaging	Mar 19, 2025	reinforcement-learningReinforcement Learning	—Unverified	0
Empowering Medical Multi-Agents with Clinical Consultation Flow for Dynamic Diagnosis	Mar 19, 2025	Decision MakingDiagnostic	—Unverified	0
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning	Mar 19, 2025	reinforcement-learningReinforcement Learning	—Unverified	0
LogLLaMA: Transformer-based log anomaly detection with LLaMA	Mar 19, 2025	Anomaly DetectionReinforcement Learning (RL)	—Unverified	0

Show:10 25 50

← PrevPage 74 of 1512Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified