SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 21262150 of 15113 papers

TitleStatusHype
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced DatasetsCode1
Reinforcement Learning Friendly Vision-Language Model for MinecraftCode1
Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued PoliciesCode1
KnowRL: Exploring Knowledgeable Reinforcement Learning for FactualityCode1
Know Your Action Set: Learning Action Relations for Reinforcement LearningCode1
Language Control Diffusion: Efficiently Scaling through Space, Time, and TasksCode1
Learning safety in model-based Reinforcement Learning using MPC and Gaussian ProcessesCode1
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic ReasoningCode1
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of ViewCode1
Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team CompositionCode1
Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning PoliciesCode1
Learning Synthetic Environments and Reward Networks for Reinforcement LearningCode1
Bidirectional Model-based Policy OptimizationCode1
B-Pref: Benchmarking Preference-Based Reinforcement LearningCode1
Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement LearningCode1
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation ErrorsCode1
Karolos: An Open-Source Reinforcement Learning Framework for Robot-Task EnvironmentsCode1
Knowledge Graph Reasoning with Self-supervised Reinforcement LearningCode1
Jump-Start Reinforcement LearningCode1
Learning to combine primitive skills: A step towards versatile robotic manipulationCode1
Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the PastCode1
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model ReasoningCode1
Collaborative Multi-Agent Dialogue Model Training Via Reinforcement LearningCode1
Blue River Controls: A toolkit for Reinforcement Learning Control Systems on HardwareCode1
BOME! Bilevel Optimization Made Easy: A Simple First-Order ApproachCode1
Show:102550
← PrevPage 86 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified