SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 92519275 of 15113 papers

TitleStatusHype
Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments0
Provably Safe Model-Based Meta Reinforcement Learning: An Abstraction-Based Approach0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes0
Provably Sample-Efficient RL with Side Information about Latent Dynamics0
Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering0
Proximal Deterministic Policy Gradient0
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning0
Proximal Policy Optimization and its Dynamic Version for Sequence Generation0
Proximal Policy Optimization-Based Reinforcement Learning Approach for DC-DC Boost Converter Control: A Comparative Evaluation Against Traditional Control Techniques0
Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information0
Proximal Policy Optimization via Enhanced Exploration Efficiency0
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces0
Proximal Reliability Optimization for Reinforcement Learning0
Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning0
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy0
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control0
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
Pseudo-Model-Free Hedging for Variable Annuities via Deep Reinforcement Learning0
Pseudorehearsal in actor-critic agents0
Pseudorehearsal in actor-critic agents with neural network function approximation0
Pseudorehearsal in value function approximation0
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning0
Reinforcement Learning and its Connections with Neuroscience and Psychology0
Show:102550
← PrevPage 371 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified