SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1285112900 of 15113 papers

TitleStatusHype
Temporal Regularization for Markov Decision ProcessCode0
Simplifying Deep Reinforcement Learning via Self-SupervisionCode0
Dynamic Multi-Reward Weighting for Multi-Style Controllable GenerationCode0
Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement LearningCode0
Temporal Regularization in Markov Decision ProcessCode0
Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue SystemsCode0
Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement LearningCode0
Rethinking the Role of Proxy Rewards in Language Model AlignmentCode0
Reinforcement Learning with Dynamic Boltzmann Softmax UpdatesCode0
Reinforcement Learning with Deep Energy-Based PoliciesCode0
Molecular De Novo Design through Deep Reinforcement LearningCode0
Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental ChangesCode0
ZPD Teaching Strategies for Deep Reinforcement Learning from DemonstrationsCode0
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge ReasoningCode0
Retrospex: Language Agent Meets Offline Reinforcement Learning CriticCode0
Reinforcement Learning with a TerminatorCode0
Sim-to-Real Reinforcement Learning for Deformable Object ManipulationCode0
Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement LearningCode0
Preferences Implicit in the State of the WorldCode0
Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement LearningCode0
Reinforcement Learning with Algorithms from Probabilistic Structure EstimationCode0
Towards Safe Policy Improvement for Non-Stationary MDPsCode0
TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlowCode0
Preference-Guided Reinforcement Learning for Efficient ExplorationCode0
MOFGPT: Generative Design of Metal-Organic Frameworks using Language ModelsCode0
Reinforcement Learning with Adaptive Regularization for Safe Control of Critical SystemsCode0
Online Cyber-Attack Detection in Smart Grid: A Reinforcement Learning ApproachCode0
Reinforcement Learning with a Corrupted Reward ChannelCode0
NARS vs. Reinforcement learning: ONA vs. Q-LearningCode0
Integrating Distributed Architectures in Highly Modular RL LibrariesCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
Preference-based Interactive Multi-Document SummarisationCode0
Predictive World Models from Real-World Partial ObservationsCode0
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail PromotionsCode0
Simulation-based reinforcement learning for real-world autonomous drivingCode0
Unified Distributed EnvironmentCode0
Reinforcement Learning with A* and a Deep HeuristicCode0
Simulation of Nanorobots with Artificial Intelligence and Reinforcement Learning for Advanced Cancer Cell Detection and TrackingCode0
Revisiting Fundamentals of Experience ReplayCode0
Towards Sample Efficient Agents through Algorithmic AlignmentCode0
Reinforcement Learning When All Actions are Not Always AvailableCode0
Reinforcement Learning via Recurrent Convolutional Neural NetworksCode0
Predicting Research Trends From ArxivCode0
Revisiting Prioritized Experience Replay: A Value PerspectiveCode0
Reinforcement Learning via Auxiliary Task DistillationCode0
Online Baum-Welch algorithm for Hierarchical Imitation LearningCode0
Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic MethodsCode0
Revisiting State Augmentation methods for Reinforcement Learning with Stochastic DelaysCode0
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement LearningCode0
Towards Scalable Verification of Deep Reinforcement LearningCode0
Show:102550
← PrevPage 258 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified