SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 66266650 of 15113 papers

TitleStatusHype
Dominion: A New Frontier for AI Research0
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition0
Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning0
Don't do it: Safer Reinforcement Learning With Rule-based Guidance0
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL0
Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework0
Don't Get Yourself into Trouble! Risk-aware Decision-Making for Autonomous Vehicles0
Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning0
Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation0
DOOM: A Novel Adversarial-DRL-Based Op-Code Level Metamorphic Malware Obfuscator for the Enhancement of IDS0
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation0
Do recent advancements in model-based deep reinforcement learning really improve data efficiency?0
Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari0
Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation0
Double A3C: Deep Reinforcement Learning on OpenAI Gym Games0
Double Deep Q Networks for Sensor Management in Space Situational Awareness0
Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments0
Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms0
Double Q-learning0
Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation0
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning0
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning0
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning0
DQLAP: Deep Q-Learning Recommender Algorithm with Update Policy for a Real Steam Turbine System0
DQNAS: Neural Architecture Search using Reinforcement Learning0
Show:102550
← PrevPage 266 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified