SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 86268650 of 15113 papers

TitleStatusHype
Provably Efficient Reinforcement Learning via Surprise Bound0
Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL0
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation0
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning0
Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics0
Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration0
Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments0
Provably Safe Model-Based Meta Reinforcement Learning: An Abstraction-Based Approach0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes0
Provably Sample-Efficient RL with Side Information about Latent Dynamics0
Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering0
Proximal Deterministic Policy Gradient0
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning0
Proximal Policy Optimization and its Dynamic Version for Sequence Generation0
Proximal Policy Optimization-Based Reinforcement Learning Approach for DC-DC Boost Converter Control: A Comparative Evaluation Against Traditional Control Techniques0
Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information0
Proximal Policy Optimization via Enhanced Exploration Efficiency0
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces0
Proximal Reliability Optimization for Reinforcement Learning0
Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning0
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy0
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control0
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
Show:102550
← PrevPage 346 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified