SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 83268350 of 15113 papers

TitleStatusHype
Off-Policy Evaluation for Human Feedback0
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders0
Off-Policy Evaluation in Partially Observable Environments0
Off-Policy Evaluation via Off-Policy Classification0
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory0
Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces0
Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift0
Off-policy reinforcement learning for H_ control design0
Off-Policy Reinforcement Learning with Delayed Rewards0
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction0
Off-Policy Reinforcement Learning with High Dimensional Reward0
Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error0
Off-Policy Risk-Sensitive Reinforcement Learning Based Constrained Robust Optimal Control0
Off-Policy Selection for Initiating Human-Centric Experimental Design0
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation0
Off-Policy Shaping Ensembles in Reinforcement Learning0
OffRIPP: Offline RL-based Informative Path Planning0
Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning0
Offsetting Unequal Competition through RL-assisted Incentive Schemes0
OffWorld Gym: open-access physical robotics environment for real-world reinforcement learning benchmark and research0
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents0
OIL: Observational Imitation Learning0
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions0
O-MAPL: Offline Multi-agent Preference Learning0
Omega-Regular Objectives in Model-Free Reinforcement Learning0
Show:102550
← PrevPage 334 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified