SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1092610950 of 15113 papers

TitleStatusHype
Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning0
Intelligent Residential Energy Management System using Deep Reinforcement Learning0
Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement LearningCode0
Time-Variant Variational Transfer for Value Functions0
Towards intervention-centric causal reasoning in learning agents0
Anomaly Detection Under Controlled Sensing Using Actor-Critic Reinforcement Learning0
A reinforcement learning approach to rare trajectory samplingCode0
ALBA : Reinforcement Learning for Video Object SegmentationCode0
Integrating LEO Satellite and UAV Relaying via Reinforcement Learning for Non-Terrestrial Networks0
Active Measure Reinforcement Learning for Observation Cost Minimization0
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model0
Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games0
Gradient Monitored Reinforcement Learning0
Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce0
Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning0
Deep Reinforcement Learning Based Power Allocation for D2D Network0
Deep Learning Models for Automatic Summarization0
Policy Entropy for Out-of-Distribution Classification0
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications0
Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks0
Reinforcement Learning with Iterative Reasoning for Merging in Dense Traffic0
Model-free Reinforcement Learning for Stochastic Stackelberg Security Games0
GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning0
Automatic Discovery of Interpretable Planning StrategiesCode0
Evaluating Generalisation in General Video Game Playing0
Show:102550
← PrevPage 438 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified