SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 376400 of 15113 papers

TitleStatusHype
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement LearningCode2
Generalized Inner Loop Meta-LearningCode2
Emergent Tool Use From Multi-Agent AutocurriculaCode2
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorchCode2
Interactive Differentiable SimulationCode2
Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous VehiclesCode2
Visual Reinforcement Learning with Imagined GoalsCode2
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics ModelsCode2
Accelerated Methods for Deep Reinforcement LearningCode2
SQLNet: Generating Structured Queries From Natural Language Without Reinforcement LearningCode2
Flow: A Modular Learning Framework for Mixed Autonomy TrafficCode2
Learning through Dialogue Interactions by Asking QuestionsCode2
Dialogue Learning With Human-In-The-LoopCode2
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement LearningCode2
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
Deep Reinforcement Learning with Gradient Eligibility TracesCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
IRanker: Towards Ranking Foundation ModelCode1
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model LearningCode1
A Production Scheduling Framework for Reinforcement Learning Under Real-World ConstraintsCode1
Visual Pre-Training on Unlabeled Images using Reinforcement LearningCode1
RePO: Replay-Enhanced Policy OptimizationCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Show:102550
← PrevPage 16 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified