SOTAVerified|Agents Browse Leaderboard About Blog

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2481–2490 of 15113 papers

Title	Date	Tasks	Status	Hype
Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning	Feb 3, 2024	Offline RLReinforcement Learning (RL)	—Unverified	0
A Survey of Constraint Formulations in Safe Reinforcement Learning	Feb 3, 2024	Diversityreinforcement-learning	—Unverified	0
Rethinking the Role of Proxy Rewards in Language Model Alignment	Feb 2, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
The Political Preferences of LLMs	Feb 2, 2024	Reinforcement Learning (RL)	—Unverified	0
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models	Feb 2, 2024	Reinforcement Learning (RL)	—Unverified	0
An Auction-based Marketplace for Model Trading in Federated Learning	Feb 2, 2024	Federated LearningMarketing	—Unverified	0
To the Max: Reinventing Reward in Reinforcement Learning	Feb 2, 2024	reinforcement-learningReinforcement Learning	CodeCode Available	0
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems	Feb 2, 2024	reinforcement-learningReinforcement Learning	—Unverified	0
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback	Feb 2, 2024	Code CompletionCode Generation	CodeCode Available	2
Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning	Feb 1, 2024	Imitation LearningMuJoCo	CodeCode Available	0

Show:10 25 50

← PrevPage 249 of 1512Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified