SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 86018650 of 15113 papers

TitleStatusHype
Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation0
Provably Efficient CVaR RL in Low-rank MDPs0
Provably Efficient Exploration in Policy Optimization0
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret0
Provably Efficient Exploration in Reward Machines with Low Regret0
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback0
Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation0
Provably Efficient Model-Free Algorithms for Non-stationary CMDPs0
Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication0
Provably Efficient Multi-Task Reinforcement Learning with Model Transfer0
Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus0
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward0
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources0
Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints0
Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle0
Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle0
Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces0
Provably Efficient Reinforcement Learning with Aggregated States0
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension0
Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations0
Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints0
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping0
Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization0
Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games0
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems0
Provably Efficient Reinforcement Learning via Surprise Bound0
Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL0
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation0
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning0
Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics0
Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration0
Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments0
Provably Safe Model-Based Meta Reinforcement Learning: An Abstraction-Based Approach0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes0
Provably Sample-Efficient RL with Side Information about Latent Dynamics0
Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering0
Proximal Deterministic Policy Gradient0
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning0
Proximal Policy Optimization and its Dynamic Version for Sequence Generation0
Proximal Policy Optimization-Based Reinforcement Learning Approach for DC-DC Boost Converter Control: A Comparative Evaluation Against Traditional Control Techniques0
Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information0
Proximal Policy Optimization via Enhanced Exploration Efficiency0
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces0
Proximal Reliability Optimization for Reinforcement Learning0
Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning0
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy0
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control0
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
Show:102550
← PrevPage 173 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified