SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 80768100 of 15113 papers

TitleStatusHype
Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping0
Optimizing Traffic Lights with Multi-agent Deep Reinforcement Learning and V2X communication0
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning0
Optimizing Wireless Discontinuous Reception via MAC Signaling Learning0
Option Compatible Reward Inverse Reinforcement Learning0
Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering0
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning0
Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning0
Option Hedging with Risk Averse Reinforcement Learning0
Options as responses: Grounding behavioural hierarchies in multi-agent RL0
OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning0
Options Discovery with Budgeted Reinforcement Learning0
OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World0
Oracle-Efficient Reinforcement Learning for Max Value Ensembles0
Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path0
Oracle Inequalities for Model Selection in Offline Reinforcement Learning0
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning0
OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics0
Ordering-Based Causal Discovery with Reinforcement Learning0
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback0
Organ localisation using supervised and semi supervised approaches combining reinforcement learning with imitation learning0
Orthogonal Estimation of Wasserstein Distances0
Orthogonal Policy Gradient and Autonomous Driving Application0
OSS Mentor A framework for improving developers contributions via deep reinforcement learning0
OTC: Optimal Tool Calls via Reinforcement Learning0
Show:102550
← PrevPage 324 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified