Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1381–1390 of 15113 papers

Title	Date	Tasks	Status	Hype
From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training	Jan 10, 2025	Reinforcement Learning (RL)	CodeCode Available	1
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning	May 21, 2025	Question AnsweringReinforcement Learning (RL)	CodeCode Available	1
Active Reinforcement Learning for Robust Building Control	Dec 16, 2023	Atari GamesGame of Go	CodeCode Available	1
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework	Feb 5, 2020	reinforcement-learningReinforcement Learning	CodeCode Available	1
Combinatorial Optimization with Policy Adaptation using Latent Space Search	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
Learning to combine primitive skills: A step towards versatile robotic manipulation	Aug 2, 2019	Data AugmentationImitation Learning	CodeCode Available	1
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second	Jun 13, 2023	GPUReinforcement Learning (RL)	CodeCode Available	1
Gamma and Vega Hedging Using Deep Distributional Reinforcement Learning	May 10, 2022	Distributional Reinforcement LearningPosition	CodeCode Available	1
Aerial View Localization with Reinforcement Learning: Towards Emulating Search-and-Rescue	Sep 8, 2022	Heuristic Searchreinforcement-learning	CodeCode Available	1
Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging	Nov 17, 2020	Autonomous DrivingModel Predictive Control	CodeCode Available	1

Show:10 25 50

← PrevPage 139 of 1512Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified