SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1090110950 of 15113 papers

TitleStatusHype
Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval0
AutoHAS: Efficient Hyperparameter and Architecture Search0
State Action Separable Reinforcement Learning0
Refined Continuous Control of DDPG Actors via Parametrised Activation0
Visual Transfer for Reinforcement Learning via Wasserstein Domain ConfusionCode0
Meta-Model-Based Meta-Policy Optimization0
Constrained Reinforcement Learning for Dynamic Optimization under Uncertainty0
A Novel Update Mechanism for Q-Networks Based On Extreme Learning MachinesCode0
Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains0
Learning to Scan: A Deep Reinforcement Learning Approach for Personalized Scanning in CT Imaging0
The Value-Improvement Path: Towards Better Representations for Reinforcement Learning0
Temporally-Extended ε-Greedy ExplorationCode0
Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient ExplorationCode0
Jointly Learning Environments and Control Policies with Projected Stochastic Gradient AscentCode0
Active Vision for Early Recognition of Human Actions0
A novel approach for multi-agent cooperative pursuit to capture grouped evaders0
Reinforcement learning and Bayesian data assimilation for model-informed precision dosing in oncology0
Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning0
Temporal-Differential Learning in Continuous Environments0
Robust Reinforcement Learning with Wasserstein Constraint0
Model-Based Reinforcement Learning with Value-Targeted Regression0
Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog0
MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement LearningCode0
Reinforcement LearningCode0
AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments0
Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning0
Intelligent Residential Energy Management System using Deep Reinforcement Learning0
Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement LearningCode0
Time-Variant Variational Transfer for Value Functions0
Towards intervention-centric causal reasoning in learning agents0
Anomaly Detection Under Controlled Sensing Using Actor-Critic Reinforcement Learning0
A reinforcement learning approach to rare trajectory samplingCode0
ALBA : Reinforcement Learning for Video Object SegmentationCode0
Integrating LEO Satellite and UAV Relaying via Reinforcement Learning for Non-Terrestrial Networks0
Active Measure Reinforcement Learning for Observation Cost Minimization0
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model0
Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games0
Gradient Monitored Reinforcement Learning0
Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce0
Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning0
Deep Reinforcement Learning Based Power Allocation for D2D Network0
Deep Learning Models for Automatic Summarization0
Policy Entropy for Out-of-Distribution Classification0
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications0
Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks0
Reinforcement Learning with Iterative Reasoning for Merging in Dense Traffic0
Model-free Reinforcement Learning for Stochastic Stackelberg Security Games0
GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning0
Automatic Discovery of Interpretable Planning StrategiesCode0
Evaluating Generalisation in General Video Game Playing0
Show:102550
← PrevPage 219 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified