SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 49014950 of 15113 papers

TitleStatusHype
Reward Shaping via Diffusion Process in Reinforcement Learning0
Adaptive Ordered Information Extraction with Deep Reinforcement LearningCode0
On the Model-Misspecification in Reinforcement Learning0
AdaStop: adaptive statistical testing for sound comparisons of Deep RL agentsCode0
Enhancing variational quantum state diagonalization using reinforcement learning techniquesCode0
Acceleration in Policy Optimization0
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions0
Genes in Intelligent AgentsCode0
Active Policy Improvement from Multiple Black-box OraclesCode0
Do as I can, not as I get0
Bootstrapped Representations in Reinforcement Learning0
Temporal Difference Learning with Experience Replay0
Semi-Offline Reinforcement Learning for Optimized Text GenerationCode0
The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement0
Real-Time Network-Level Traffic Signal Control: An Explicit Multiagent Coordination Method0
Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization0
Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) for comfortable and safe autonomous driving0
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Granger Causal Interaction Skill Chains0
A reinforcement learning strategy for p-adaptation in high order solvers0
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning0
Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning0
Multi-market Energy Optimization with Renewables via Reinforcement Learning0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
Unified Off-Policy Learning to Rank: a Reinforcement Learning PerspectiveCode0
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning0
DenseLight: Efficient Control for Large-scale Traffic Signals with Dense FeedbackCode0
Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement LearningCode0
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning0
Kernelized Reinforcement Learning with Order Optimal Regret Bounds0
Combining Reinforcement Learning and Barrier Functions for Adaptive Risk Management in Portfolio Optimization0
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles0
Diverse Projection Ensembles for Distributional Reinforcement Learning0
Robust Reinforcement Learning through Efficient Adversarial Herding0
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds0
Online Prototype Alignment for Few-shot Policy TransferCode0
Reinforcement Learning in Robotic Motion Planning by Combined Experience-based Planning and Self-Imitation Learning0
PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning0
Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel0
The Role of Diverse Replay for Generalisation in Reinforcement Learning0
Learning Not to Spoof0
Approximate information state based convergence analysis of recurrent Q-learning0
Iteratively Refined Behavior Regularization for Offline Reinforcement Learning0
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation0
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning0
Timing Process Interventions with Causal Inference and Reinforcement Learning0
State Regularized Policy Optimization on Data with Dynamics Shift0
Show:102550
← PrevPage 99 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified