SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1105111100 of 15113 papers

TitleStatusHype
Near-optimal Regret Bounds for Stochastic Shortest Path0
Rapidly Personalizing Mobile Health Treatment Policies with Limited Data0
Optimizing Traffic Lights with Multi-agent Deep Reinforcement Learning and V2X communication0
Discriminative Particle Filter Reinforcement Learning for Complex Partial ObservationsCode1
Deep Reinforcement Learning with Linear Quadratic Regulator Regions0
Adversarial Radar Inference. From Inverse Tracking to Inverse Reinforcement Learning of Cognitive Radar0
Automatic Data Augmentation via Deep Reinforcement Learning for Effective Kidney Tumor Segmentation0
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion0
Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement Learning0
Reinforcement Learning Framework for Deep Brain Stimulation StudyCode1
Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach0
On the Search for Feedback in Reinforcement Learning0
Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning0
Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy0
Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search0
Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning0
Adaptive Temporal Difference Learning with Linear Function Approximation0
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions0
Multi-Agent Reinforcement Learning as a Computational Tool for Language Evolution Research: Historical Context and Future Challenges0
Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems0
Debiased Off-Policy Evaluation for Recommendation Systems0
UAV Aided Search and Rescue Operation Using Reinforcement Learning0
Sim2Real Transfer for Reinforcement Learning without Dynamics RandomizationCode1
Value-driven Hindsight Modelling0
Optimistic Policy Optimization with Bandit Feedback0
Efficient Deep Reinforcement Learning via Adaptive Policy TransferCode0
Curriculum in Gradient-Based Meta-Reinforcement Learning0
How To Avoid Being Eaten By a Grue: Exploration Strategies for Text-Adventure AgentsCode1
Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning0
Generating Automatic Curricula via Self-Supervised Active Domain RandomizationCode1
Empirical Policy Evaluation with Supergraphs0
Adaptive Estimator Selection for Off-Policy EvaluationCode0
KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge0
MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding0
Reinforcement Learning for Molecular Design Guided by Quantum MechanicsCode1
Multi-Issue Bargaining With Deep Reinforcement Learning0
Langevin DQNCode0
Kalman meets Bellman: Improving Policy Evaluation through Value TrackingCode1
Control Frequency Adaptation via Action Persistence in Batch Reinforcement LearningCode0
Adaptive Experience Selection for Policy Gradient0
Reinforcement learning for the privacy preservation and manipulation of eye tracking data0
Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning0
Reinforced active learning for image segmentationCode1
R-MADDPG for Partially Observable Environments and Limited CommunicationCode1
Investigating Simple Object Representations in Model-Free Deep Reinforcement Learning0
The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI0
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling0
PDDLGym: Gym Environments from PDDL ProblemsCode1
Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement LearningCode0
Deep RL Agent for a Real-Time Action Strategy GameCode1
Show:102550
← PrevPage 222 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified