SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1145111475 of 15113 papers

TitleStatusHype
Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory0
MIME: Mutual Information Minimisation Exploration0
Reward Shaping for Reinforcement Learning with Omega-Regular Objectives0
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping0
SEERL: Sample Efficient Ensemble Reinforcement Learning0
Robotic Grasp Manipulation Using Evolutionary Computing and Deep Reinforcement Learning0
Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPOCode0
Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning0
Learning to Locomote with Deep Neural-Network and CPG-based Control in a Soft Snake Robot0
Multi-Robot Formation Control Using Reinforcement Learning0
Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon SettingsCode0
Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning0
Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback0
Sparse Black-box Video Attack with Reinforcement LearningCode0
Reward Engineering for Object Pick and Place TrainingCode0
Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle0
A storage expansion planning framework using reinforcement learning and simulation-based optimization0
Reinforcement Learning Tracking Control for Robotic Manipulator With Kernel-Based Dynamic Model0
On Computation and Generalization of Generative Adversarial Imitation Learning0
Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey0
Sample-based Distributional Policy Gradient0
Multi-Agent Deep Reinforcement Learning for Cooperative Connected Vehicles0
On Thompson Sampling for Smoother-than-Lipschitz Bandits0
A Nonparametric Off-Policy Policy GradientCode0
EEG-based Drowsiness Estimation for Driving Safety using Deep Q-Learning0
Show:102550
← PrevPage 459 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified