SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 79017925 of 15113 papers

TitleStatusHype
Branch Prediction as a Reinforcement Learning Problem: Why, How and Case Studies0
Control of a Mixed Autonomy Signalised Urban Intersection: An Action-Delayed Reinforcement Learning Approach0
Brax -- A Differentiable Physics Engine for Large Scale Rigid Body SimulationCode2
Density Constrained Reinforcement Learning0
Model-Based Reinforcement Learning via Latent-Space CollocationCode1
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy EvaluationCode1
Hierarchically Integrated Models: Learning to Navigate from Heterogeneous Robots0
The Option Keyboard: Combining Skills in Reinforcement Learning0
Reinforcement Learning-based Dialogue Guided Event Extraction to Exploit Argument RelationsCode1
Evolving Hierarchical Memory-Prediction Machines in Multi-Task Reinforcement Learning0
Bregman Gradient Policy OptimizationCode0
Uncertainty-Aware Model-Based Reinforcement Learning with Application to Autonomous Driving0
Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL0
Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation0
Local policy search with Bayesian optimizationCode1
Off-Policy Reinforcement Learning with Delayed Rewards0
MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning0
Variance-Aware Off-Policy Evaluation with Linear Function Approximation0
Reinforcement Learning for Physical Layer CommunicationsCode0
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations0
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning0
Lifted Model Checking for Relational MDPs0
Distributed Heuristic Multi-Agent Path Finding with CommunicationCode1
Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations0
Emphatic Algorithms for Deep Reinforcement Learning0
Show:102550
← PrevPage 317 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified