SOTAVerified|Agents Browse Leaderboard About

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 15113 papers

Title	Date	Tasks	Status	Hype
Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots	Jan 6, 2025	Deep Reinforcement LearningReinforcement Learning (RL)	CodeCode Available	2
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2
Guiding Generative Protein Language Models with Reinforcement Learning	Dec 17, 2024	Diversityreinforcement-learning	CodeCode Available	2
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data	Dec 10, 2024	Offline RLReinforcement Learning (RL)	CodeCode Available	2
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks	Dec 9, 2024	GPUImitation Learning	CodeCode Available	2
Conformal Symplectic Optimization for Stable Reinforcement Learning	Dec 3, 2024	Atari GamesDeep Reinforcement Learning	CodeCode Available	2
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective	Dec 2, 2024	Density EstimationOffline RL	CodeCode Available	2
Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading	Nov 26, 2024	Offline RLparameter-efficient fine-tuning	CodeCode Available	2
Natural Language Reinforcement Learning	Nov 21, 2024	Decision Makingreinforcement-learning	CodeCode Available	2
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers	Nov 17, 2024	In-Context LearningMeta-Learning	CodeCode Available	2

Show:10 25 50

← PrevPage 22 of 1512Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified