SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 87018725 of 15113 papers

TitleStatusHype
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning0
Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning0
Option Hedging with Risk Averse Reinforcement Learning0
Options as responses: Grounding behavioural hierarchies in multi-agent RL0
OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning0
Options Discovery with Budgeted Reinforcement Learning0
OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World0
Oracle-Efficient Reinforcement Learning for Max Value Ensembles0
Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path0
Oracle Inequalities for Model Selection in Offline Reinforcement Learning0
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning0
OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics0
Ordering-Based Causal Discovery with Reinforcement Learning0
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback0
Organ localisation using supervised and semi supervised approaches combining reinforcement learning with imitation learning0
Orthogonal Estimation of Wasserstein Distances0
Orthogonal Policy Gradient and Autonomous Driving Application0
OSS Mentor A framework for improving developers contributions via deep reinforcement learning0
OTC: Optimal Tool Calls via Reinforcement Learning0
“Other-Play” for Zero-Shot Coordination0
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning0
OTTR: Off-Road Trajectory Tracking using Reinforcement Learning0
Outcome-Constrained Large Language Models for Countering Hate Speech0
Outcome-Driven Reinforcement Learning via Variational Inference0
Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space0
Show:102550
← PrevPage 349 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified