SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 88518875 of 15113 papers

TitleStatusHype
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity0
Personalized Lane Change Decision Algorithm Using Deep Reinforcement Learning Approach0
Personalized Medical Treatments Using Novel Reinforcement Learning Algorithms0
Personalizing a Dialogue System with Transfer Reinforcement Learning0
Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback0
Perspective Taking in Deep Reinforcement Learning Agents0
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning0
Persuading to Prepare for Quitting Smoking with a Virtual Coach: Using States and User Characteristics to Predict Behavior0
Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space0
Perturbation-based exploration methods in deep reinforcement learning0
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes0
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning0
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning0
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data0
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage0
Pessimistic Model Selection for Offline Deep Reinforcement Learning0
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning0
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity0
Petri Net Machines for Human-Agent Interaction0
PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation0
Phase Re-service in Reinforcement Learning Traffic Signal Control0
Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning0
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math0
Phi-4-reasoning Technical Report0
Phoebe: Reuse-Aware Online Caching with Reinforcement Learning for Emerging Storage Models0
Show:102550
← PrevPage 355 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified