SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 96019625 of 15113 papers

TitleStatusHype
Learning to Deceive Knowledge Graph Augmented Models via Targeted PerturbationCode0
Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning0
Stochastic Inverse Reinforcement Learning0
Stabilizing Transformer-Based Action Sequence Generation For Q-Learning0
Towards Safe Policy Improvement for Non-Stationary MDPsCode0
Learning Guidance Rewards with Trajectory-space SmoothingCode1
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration0
Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement LearningCode1
Bridging Imagination and Reality for Model-Based Deep Reinforcement LearningCode1
Multi-UAV Path Planning for Wireless Data Harvesting with Deep Reinforcement LearningCode1
Option Hedging with Risk Averse Reinforcement Learning0
Optimizing Coverage and Capacity in Cellular Networks using Machine Learning0
Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning0
Reinforcement Learning with Combinatorial Actions: An Application to Vehicle RoutingCode1
Sample Efficient Reinforcement Learning with REINFORCE0
Adversarial Attacks on Deep Algorithmic Trading Policies0
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based GamesCode0
Incorporating Stylistic Lexical Preferences in Generative Language Models0
Detecting Rewards Deterioration in Episodic Reinforcement LearningCode0
Batch Exploration with Examples for Scalable Robotic Reinforcement LearningCode1
Error Bounds of Imitating Policies and Environments0
CoinDICE: Off-Policy Confidence Interval Estimation0
Accelerating Reinforcement Learning with Learned Skill PriorsCode1
What are the Statistical Limits of Offline RL with Linear Function Approximation?0
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments0
Show:102550
← PrevPage 385 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified