SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1135111375 of 15113 papers

TitleStatusHype
Fast Reinforcement Learning for Anti-jamming Communications0
MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization0
Multi-Vehicle Routing Problems with Soft Time Windows: A Multi-Agent Reinforcement Learning Approach0
Regret Bounds for Discounted MDPs0
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement LearningCode0
A Tensor Network Approach to Finite Markov Decision Processes0
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing0
HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem0
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning0
Learning Structured Communication for Multi-agent Reinforcement Learning0
Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes0
Machine Learning Approaches For Motor Learning: A Short Review0
Towards Intelligent Pick and Place Assembly of Individualized Products Using Reinforcement Learning0
Provable Self-Play Algorithms for Competitive Reinforcement Learning0
On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning0
On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning and SLAM Based Approach0
Proficiency Constrained Multi-Agent Reinforcement Learning for Environment-Adaptive Multi UAV-UGV Teaming0
Discrete Action On-Policy Learning with Action-Value CriticCode0
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions0
Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons0
RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning0
Multi-task Reinforcement Learning with a Planning Quasi-Metric0
BRPO: Batch Residual Policy Optimization0
Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning0
Conservative Exploration in Reinforcement Learning0
Show:102550
← PrevPage 455 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified