SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 29012925 of 15113 papers

TitleStatusHype
Improved Off-policy Reinforcement Learning in Biological Sequence DesignCode0
Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI gamesCode0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Contrastive Explanations for Reinforcement Learning via Embedded Self PredictionsCode0
Improving reinforcement learning algorithms: towards optimal learning rate policiesCode0
Improving Reinforcement Learning Based Image Captioning with Natural Language PriorCode0
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement LearningCode0
Improved Sample Complexity Bounds for Distributionally Robust Reinforcement LearningCode0
Implicit Quantile Networks for Distributional Reinforcement LearningCode0
Continuous Doubly Constrained Batch Reinforcement LearningCode0
Importance Prioritized Policy DistillationCode0
Improving the Efficient Neural Architecture Search via Rewarding ModificationsCode0
Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement LearningCode0
Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learningCode0
Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time SystemsCode0
A general class of surrogate functions for stable and efficient reinforcement learningCode0
Artificial Intelligence for Prosthetics - challenge solutionsCode0
Impartial Games: A Challenge for Reinforcement LearningCode0
Continuous Control With Ensemble Deep Deterministic Policy GradientsCode0
Imitation Learning by Reinforcement LearningCode0
Imitation Learning for Sentence Generation with Dilated Convolutions Using Adversarial TrainingCode0
Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog ManagementCode0
Controllable Neural Story Plot Generation via Reward ShapingCode0
Incorporating Rivalry in Reinforcement Learning for a Competitive GameCode0
Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned PoliciesCode0
Show:102550
← PrevPage 117 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified