SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 58765900 of 15113 papers

TitleStatusHype
Self-supervised reinforcement learning for speaker localisation with the iCub humanoid robot0
Self-supervised Reinforcement Learning with Independently Controllable Subgoals0
Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning0
Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation0
Self-Supervised Structured Representations for Deep Reinforcement Learning0
Self-timed Reinforcement Learning using Tsetlin Machine0
Self Training Autonomous Driving Agent0
A Self-Tuning Actor-Critic Algorithm0
Self-Tuning Sectorization: Deep Reinforcement Learning Meets Broadcast Beam Optimization0
Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks0
Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints0
Semantic Exploration from Language Abstractions and Pretrained Representations0
Semantic Guidance of Dialogue Generation with Reinforcement Learning0
Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning0
Semi-analytical Industrial Cooling System Model for Reinforcement Learning0
Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction0
Semi-Data-Aided Channel Estimation for MIMO Systems via Reinforcement Learning0
Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients0
Semi-pessimistic Reinforcement Learning0
Semi-supervised Offline Reinforcement Learning with Pre-trained Decision Transformers0
Semi-Supervised Off Policy Reinforcement Learning0
Semi-Supervised QA with Generative Domain-Adaptive Nets0
Semi-supervised reward learning for offline reinforcement learning0
SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets0
Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments0
Show:102550
← PrevPage 236 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified