SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 471480 of 15113 papers

TitleStatusHype
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM JailbreakingCode1
An Attentive Graph Agent for Topology-Adaptive Cyber DefenceCode1
From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster trainingCode1
Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning PoliciesCode1
Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic ManipulationCode1
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous InferenceCode1
RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM EnhancementCode1
Entropy-Regularized Process Reward ModelCode1
Are Expressive Models Truly Necessary for Offline RL?Code1
Show:102550
← PrevPage 48 of 1512Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified