SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 301350 of 15113 papers

TitleStatusHype
Advancing Language Model Reasoning through Reinforcement Learning and Inference ScalingCode2
Feedback Efficient Online Fine-Tuning of Diffusion ModelsCode2
Direct Multi-Turn Preference Optimization for Language AgentsCode2
A Critical Evaluation of AI Feedback for Aligning Large Language ModelsCode2
Digi-Q: Learning Q-Value Functions for Training Device-Control AgentsCode2
FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource AllocationCode2
Distributional Soft Actor-Critic with Three RefinementsCode2
Diffusion Actor-Critic with Entropy RegulatorCode2
Foundation Policies with Hilbert RepresentationsCode2
FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex ManipulationCode2
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy OptimizationCode2
DIAMBRA Arena: a New Reinforcement Learning Platform for Research and ExperimentationCode2
GenNBV: Generalizable Next-Best-View Policy for Active 3D ReconstructionCode2
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and MemoryCode2
A Simulation Benchmark for Autonomous Racing with Large-Scale Human DataCode2
Gradient Boosting Reinforcement LearningCode2
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMsCode2
Graphs Meet AI Agents: Taxonomy, Progress, and Future OpportunitiesCode2
DiffMimic: Efficient Motion Mimicking with Differentiable PhysicsCode2
Heterogeneous Multi-Robot Reinforcement LearningCode2
Diffusion Models for Reinforcement Learning: A SurveyCode2
Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement LearningCode2
Developing A Multi-Agent and Self-Adaptive Framework with Deep Reinforcement Learning for Dynamic Portfolio Risk ManagementCode2
DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal SystemsCode2
A Review of Safe Reinforcement Learning: Methods, Theory and ApplicationsCode2
iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvementCode2
A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement LearningCode2
Interactive Differentiable SimulationCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
Agent models: Internalizing Chain-of-Action Generation into Reasoning modelsCode2
ARPO:End-to-End Policy Optimization for GUI Agents with Experience ReplayCode2
Dialogue Learning With Human-In-The-LoopCode2
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement LearningCode2
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAXCode2
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented GenerationCode2
Language Models can Solve Computer TasksCode2
Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph MatchingCode2
AGILE: A Novel Reinforcement Learning Framework of LLM AgentsCode2
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
Benchmarking Potential Based Rewards for Learning Humanoid LocomotionCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Learning Physically Realizable Skills for Online Packing of General 3D ShapesCode2
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics ModelsCode2
Decoupling Representation Learning from Reinforcement LearningCode2
DayDreamer: World Models for Physical Robot LearningCode2
Learn to Reason Efficiently with Adaptive Length-based Reward ShapingCode2
Deep Reinforcement Learning for Multi-Agent InteractionCode2
Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical RobotCode2
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous controlCode2
Curiosity-driven Red-teaming for Large Language ModelsCode2
Show:102550
← PrevPage 7 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified