SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 49765000 of 15113 papers

TitleStatusHype
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning0
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control0
Replicability in Reinforcement Learning0
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL0
Robust Reinforcement Learning Objectives for Sequential Recommender SystemsCode0
Policy Optimization for Continuous Reinforcement Learning0
RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion0
Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple ReuseCode0
Towards a Better Understanding of Representation Dynamics under TD-learning0
Bridging the Sim-to-Real Gap from the Information Bottleneck PerspectiveCode0
RLAD: Reinforcement Learning from Pixels for Autonomous Driving in Urban Environments0
Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles0
The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model0
Reinforcement Learning with Simple Sequence Priors0
Policy Synthesis and Reinforcement Learning for Discounted LTL0
Emergent Agentic Transformer from Chain of Hindsight Experience0
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback0
Distributional Reinforcement Learning with Dual Expectile-Quantile Regression0
A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following AgentsCode0
End-to-End Meta-Bayesian Optimisation with Transformer Neural ProcessesCode0
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion ModelsCode0
Deterministic policy gradient based optimal control with probabilistic constraints0
Reward-Machine-Guided, Self-Paced Reinforcement LearningCode0
Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure0
Decision-Aware Actor-Critic with Function Approximation and Theoretical GuaranteesCode0
Show:102550
← PrevPage 200 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified