SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 351400 of 15113 papers

TitleStatusHype
Brax -- A Differentiable Physics Engine for Large Scale Rigid Body SimulationCode2
DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement LearningCode2
Model-agnostic and Scalable Counterfactual Explanations via Reinforcement LearningCode2
AndroidEnv: A Reinforcement Learning Platform for AndroidCode2
MBRL-Lib: A Modular Library for Model-based Reinforcement LearningCode2
AMP: Adversarial Motion Priors for Stylized Physics-Based Character ControlCode2
Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter ControlCode2
Learning Accurate Long-term Dynamics for Model-based Reinforcement LearningCode2
Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph MatchingCode2
Connections between Relational Event Model and Inverse Reinforcement Learning for Characterizing Group Interaction SequencesCode2
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous DrivingCode2
PettingZoo: Gym for Multi-Agent Reinforcement LearningCode2
Decoupling Representation Learning from Reinforcement LearningCode2
DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control in the IoVCode2
Flightmare: A Flexible Quadrotor SimulatorCode2
Aligning AI With Shared Human ValuesCode2
Smooth Exploration for Robotic Reinforcement LearningCode2
The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information BudgetCode2
D4RL: Datasets for Deep Data-Driven Reinforcement LearningCode2
Machine Learning in Asset Management—Part 2: Portfolio Construction—Weight Optimization. The Journal of Financial Data ScienceCode2
Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based MethodsCode2
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement LearningCode2
Neuroevolution of Self-Interpretable AgentsCode2
Leveraging Procedural Generation to Benchmark Reinforcement LearningCode2
Learning to Predict Without Looking Ahead: World Models Without Forward PredictionCode2
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement LearningCode2
Generalized Inner Loop Meta-LearningCode2
Emergent Tool Use From Multi-Agent AutocurriculaCode2
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorchCode2
Interactive Differentiable SimulationCode2
Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous VehiclesCode2
Visual Reinforcement Learning with Imagined GoalsCode2
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics ModelsCode2
Accelerated Methods for Deep Reinforcement LearningCode2
SQLNet: Generating Structured Queries From Natural Language Without Reinforcement LearningCode2
Flow: A Modular Learning Framework for Mixed Autonomy TrafficCode2
Learning through Dialogue Interactions by Asking QuestionsCode2
Dialogue Learning With Human-In-The-LoopCode2
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement LearningCode2
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
Deep Reinforcement Learning with Gradient Eligibility TracesCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
IRanker: Towards Ranking Foundation ModelCode1
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model LearningCode1
A Production Scheduling Framework for Reinforcement Learning Under Real-World ConstraintsCode1
Visual Pre-Training on Unlabeled Images using Reinforcement LearningCode1
RePO: Replay-Enhanced Policy OptimizationCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Show:102550
← PrevPage 8 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified