SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 69767000 of 15113 papers

TitleStatusHype
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective0
Distributional Reinforcement Learning for Multi-Dimensional Reward FunctionsCode0
Automating Control of Overestimation Bias for Reinforcement Learning0
A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments0
Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning0
Operator Shifting for Model-based Policy Evaluation0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Mixture-of-Variational-Experts for Continual LearningCode0
Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning0
Recurrent Off-policy Baselines for Memory-based Continuous ControlCode1
Uniformly Conservative Exploration in Reinforcement LearningCode1
Self-Consistent Models and Values0
Goal-Aware Cross-Entropy for Multi-Target Reinforcement LearningCode1
Learning What to Memorize: Using Intrinsic Motivation to Form Useful Memory in Partially Observable Reinforcement Learning0
Can Q-Learning be Improved with Advice?0
Deep Reinforcement Learning for Simultaneous Sensing and Channel Access in Cognitive Networks0
Understanding the World Through ActionCode1
False Correlation Reduction for Offline Reinforcement LearningCode1
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Fully Distributed Actor-Critic Architecture for Multitask Deep Reinforcement Learning0
Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network0
Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL0
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction0
A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow0
Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming0
Show:102550
← PrevPage 280 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified