SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 21262150 of 15113 papers

TitleStatusHype
UDUC: An Uncertainty-driven Approach for Learning-based Robust Control0
Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning0
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning0
Proximal Curriculum with Task Correlations for Deep Reinforcement LearningCode0
Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots0
A Model-based Multi-Agent Personalized Short-Video Recommender System0
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach0
Simulating the Economic Impact of Rationality through Reinforcement Learning and Agent-Based ModellingCode1
Model-based reinforcement learning for protein backbone design0
Learning Optimal Deterministic Policies with Stochastic Policy Gradients0
Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk0
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient ManipulationCode5
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation0
Reinforcement Learning-Guided Semi-Supervised Learning0
Constrained Reinforcement Learning Under Model Mismatch0
Tabular and Deep Reinforcement Learning for Gittins Index0
FLAME: Factuality-Aware Alignment for Large Language Models0
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks0
Learning Force Control for Legged Manipulation0
Queue-based Eco-Driving at Roundabouts with Reinforcement Learning0
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPOCode1
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning0
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning0
Towards Generalist Robot Learning from Internet Video: A Survey0
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement LearningCode0
Show:102550
← PrevPage 86 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified