SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 72267250 of 15113 papers

TitleStatusHype
VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation0
Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT0
VPE: Variational Policy Embedding for Transfer Reinforcement Learning0
VRAIL: Vectorized Reward-based Attribution for Interpretable Learning0
VRLS: A Unified Reinforcement Learning Scheduler for Vehicle-to-Vehicle Communications0
Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning0
Vulcan: Solving the Steiner Tree Problem with Graph Neural Networks and Deep Reinforcement Learning0
Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics0
WAD: A Deep Reinforcement Learning Agent for Urban Autonomous Driving0
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning0
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap0
Warmth and competence in human-agent cooperation0
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes0
Warren at SemEval-2020 Task 4: ALBERT and Multi-Task Learning for Commonsense Validation0
Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control0
Wasserstein Adversarial Imitation Learning0
Wasserstein Dependency Measure for Representation Learning0
Wasserstein Robust Reinforcement Learning0
Wasserstein Unsupervised Reinforcement Learning0
Watch from sky: machine-learning-based multi-UAV network for predictive police surveillance0
Stop-and-Go: Exploring Backdoor Attacks on Deep Reinforcement Learning-based Traffic Congestion Control Systems0
WaveCorr: Deep Reinforcement Learning with Permutation Invariant Policy Networks for Portfolio Management0
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog0
Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog0
On L_2-consistency of nearest neighbor matching0
Show:102550
← PrevPage 290 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified