SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1040110450 of 15113 papers

TitleStatusHype
Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error0
Off-Policy Risk-Sensitive Reinforcement Learning Based Constrained Robust Optimal Control0
Off-Policy Selection for Initiating Human-Centric Experimental Design0
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation0
Off-Policy Shaping Ensembles in Reinforcement Learning0
OffRIPP: Offline RL-based Informative Path Planning0
Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning0
Offsetting Unequal Competition through RL-assisted Incentive Schemes0
OffWorld Gym: open-access physical robotics environment for real-world reinforcement learning benchmark and research0
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents0
OIL: Observational Imitation Learning0
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions0
O-MAPL: Offline Multi-agent Preference Learning0
Omega-Regular Objectives in Model-Free Reinforcement Learning0
Omega-Regular Reward Machines0
OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment0
OmniDRL: Robust Pedestrian Detection using Deep Reinforcement Learning on Omnidirectional Cameras0
OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds0
On- and Off-Policy Monotonic Policy Improvement0
On Applications of Bootstrap in Continuous Space Reinforcement Learning0
On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods0
On Bellman equations for continuous-time policy evaluation I: discretization and approximation0
On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process0
On-board Deep Q-Network for UAV-assisted Online Power Transfer and Data Collection0
On Computation and Generalization of Generative Adversarial Imitation Learning0
On Connections between Constrained Optimization and Reinforcement Learning0
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes0
On Convergence Rate of Adaptive Multiscale Value Function Approximation For Reinforcement Learning0
On Corruption-Robustness in Performative Reinforcement Learning0
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning0
On Decentralizing Federated Reinforcement Learning in Multi-Robot Scenarios0
On Double Descent in Reinforcement Learning with LSTD and Random Features0
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes0
On Efficiency in Hierarchical Reinforcement Learning0
On Enhancing Network Throughput using Reinforcement Learning in Sliced Testbeds0
One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion0
One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning0
One RL to See Them All: Visual Triple Unified Reinforcement Learning0
One-shot learning and behavioral eligibility traces in sequential decision making0
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors0
One-shot, Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning0
One-Step Distributional Reinforcement Learning0
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks0
On Gap-dependent Bounds for Offline Reinforcement Learning0
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration0
On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman0
On Improving Cross-dataset Generalization of Deepfake Detectors0
On Improving Deep Reinforcement Learning for POMDPs0
On Inductive Biases in Deep Reinforcement Learning0
On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality0
Show:102550
← PrevPage 209 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified