SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1355113600 of 15113 papers

TitleStatusHype
Weakly Supervised Scene Text Detection using Deep Reinforcement LearningCode0
Viewpoint Optimization for Autonomous Strawberry Harvesting with Deep Reinforcement LearningCode0
Optimizing Warfarin Dosing using Deep Reinforcement LearningCode0
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RLCode0
Scalable agent alignment via reward modeling: a research directionCode0
UVIP: Model-Free Approach to Evaluate Reinforcement Learning AlgorithmsCode0
VIME: Variational Information Maximizing ExplorationCode0
Multi-Agent Common Knowledge Reinforcement LearningCode0
RamseyRL: A Framework for Intelligent Ramsey Number Counterexample SearchingCode0
VINE: An Open Source Interactive Data Visualization Tool for NeuroevolutionCode0
MineRL: A Large-Scale Dataset of Minecraft DemonstrationsCode0
Scalable Coordinated Exploration in Concurrent Reinforcement LearningCode0
Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement LearningCode0
MAC-PO: Multi-Agent Experience Replay via Collective Priority OptimizationCode0
Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement LearningCode0
Weak Supervision for Fake News Detection via Reinforcement LearningCode0
Towards a Reinforcement Learning Environment Toolbox for Intelligent Electric Motor ControlCode0
Scalable Evaluation of Online Facilitation Strategies via Synthetic Simulation of DiscussionsCode0
VIREL: A Variational Inference Framework for Reinforcement LearningCode0
Successor Options: An Option Discovery Framework for Reinforcement LearningCode0
Using machine learning to inform harvest control rule design in complex fishery settingsCode0
Successor Representation Active InferenceCode0
Successor Uncertainties: Exploration and Uncertainty in Temporal Difference LearningCode0
RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate ModelsCode0
Meta-Gradient Reinforcement LearningCode0
Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learningCode0
Virtual Augmented Reality for Atari Reinforcement LearningCode0
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement LearningCode0
Mastering the Game of Sungka from Random PlayCode0
Towards Augmented Microscopy with Reinforcement Learning-Enhanced WorkflowsCode0
Optimizing the Neural Architecture of Reinforcement Learning AgentsCode0
Multi-Agent Advisor Q-LearningCode0
QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement LearningCode0
Multi-Agent Adversarial Inverse Reinforcement LearningCode0
Q-Value Weighted Regression: Reinforcement Learning with Limited DataCode0
Neural Keyphrase Generation via Reinforcement Learning with Adaptive RewardsCode0
Using Natural Language and Program Abstractions to Instill Human Inductive Biases in MachinesCode0
Using Natural Language for Reward Shaping in Reinforcement LearningCode0
Super Reinforcement Bros: Playing Super Mario Bros with Reinforcement LearningCode0
Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated EnvironmentsCode0
Optimizing Power Grid Topologies with Reinforcement Learning: A Survey of Methods and ChallengesCode0
QUOTA: The Quantile Option Architecture for Reinforcement LearningCode0
Optimizing Heat Alert Issuance with Reinforcement LearningCode0
Two-step dynamic obstacle avoidanceCode0
Massively Parallel Methods for Deep Reinforcement LearningCode0
Two steps to risk sensitivityCode0
Supervised Learning-enhanced Multi-Group Actor Critic for Live Stream Allocation in FeedCode0
Queueing Network Controls via Deep Reinforcement LearningCode0
SUPERVISED POLICY UPDATECode0
Supervised Policy Update for Deep Reinforcement LearningCode0
Show:102550
← PrevPage 272 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified