Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 15113 papers

Title	Date	Tasks	Status	Hype
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Feb 24, 2025	GSM8KMath	CodeCode Available	2
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning	Feb 14, 2025	Reinforcement Learning (RL)Skills Assessment	CodeCode Available	2
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents	Feb 13, 2025	Q-LearningReinforcement Learning (RL)	CodeCode Available	2
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Feb 10, 2025	MathMathematical Reasoning	CodeCode Available	2
Training Language Models to Reason Efficiently	Feb 6, 2025	Reinforcement Learning (RL)	CodeCode Available	2
CTR-Driven Advertising Image Generation with Multimodal Large Language Models	Feb 5, 2025	Image GenerationReinforcement Learning (RL)	CodeCode Available	2
Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs	Feb 4, 2025	Code GenerationLanguage Modeling	CodeCode Available	2
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning	Jan 25, 2025	Answer GenerationMulti-agent Reinforcement Learning	CodeCode Available	2
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling	Jan 20, 2025	Imitation LearningLanguage Modeling	CodeCode Available	2
Reasoning Language Models: A Blueprint	Jan 20, 2025	Reinforcement Learning (RL)Retrieval-augmented Generation	CodeCode Available	2
Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots	Jan 6, 2025	Deep Reinforcement LearningReinforcement Learning (RL)	CodeCode Available	2
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2
Guiding Generative Protein Language Models with Reinforcement Learning	Dec 17, 2024	Diversityreinforcement-learning	CodeCode Available	2
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data	Dec 10, 2024	Offline RLReinforcement Learning (RL)	CodeCode Available	2
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks	Dec 9, 2024	GPUImitation Learning	CodeCode Available	2
Conformal Symplectic Optimization for Stable Reinforcement Learning	Dec 3, 2024	Atari GamesDeep Reinforcement Learning	CodeCode Available	2
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective	Dec 2, 2024	Density EstimationOffline RL	CodeCode Available	2
Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading	Nov 26, 2024	Offline RLparameter-efficient fine-tuning	CodeCode Available	2
Natural Language Reinforcement Learning	Nov 21, 2024	Decision Makingreinforcement-learning	CodeCode Available	2
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers	Nov 17, 2024	In-Context LearningMeta-Learning	CodeCode Available	2
TIPO: Text to Image with Text Presampling for Prompt Optimization	Nov 12, 2024	Image GenerationLanguage Modeling	CodeCode Available	2
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks	Oct 30, 2024	General Reinforcement LearningReinforcement Learning (RL)	CodeCode Available	2
PC-Gym: Benchmark Environments For Process Control Problems	Oct 29, 2024	BenchmarkingChemical Process	CodeCode Available	2
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning	Oct 28, 2024	Benchmarkingreinforcement-learning	CodeCode Available	2
LongReward: Improving Long-context Large Language Models with AI Feedback	Oct 28, 2024	Offline RLReinforcement Learning (RL)	CodeCode Available	2

Show:10 25 50

← PrevPage 9 of 605Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified