SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1440114450 of 15113 papers

TitleStatusHype
A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming0
UCB Exploration via Q-Ensembles0
Towards Synthesizing Complex Programs from Input-Output Examples0
Actor-Critic for Linearly-Solvable Continuous MDP with Partially Known Dynamics0
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning0
Reinforcement Learning for Learning Rate Control0
The Atari Grand Challenge DatasetCode0
Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget0
Universal Reinforcement Learning Algorithms: Survey and ExperimentsCode0
Constrained Policy OptimizationCode0
Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning0
Experience Replay Using Transition Sequences0
Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation ModelsCode0
End-to-end Active Object Tracking via Reinforcement Learning0
Free energy-based reinforcement learning using a quantum processorCode0
Latent Intention Dialogue ModelsCode0
Role Playing Learning for Socially Concomitant Mobile Robot Navigation0
Boltzmann Exploration Done Right0
Cross-Domain Perceptual Reward Functions0
First-spike based visual categorization using reward-modulated STDP0
State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning0
Continuous State-Space Models for Optimal Sepsis Treatment - a Deep Reinforcement Learning Approach0
Enhanced Experience Replay Generation for Efficient Reinforcement Learning0
Visual Semantic Planning using Deep Successor Representations0
Reinforcement Learning with a Corrupted Reward ChannelCode0
Safe Model-based Reinforcement Learning with Stability GuaranteesCode0
Thinking Fast and Slow with Deep Learning and Tree SearchCode1
A unified view of entropy-regularized Markov decision processes0
Concrete DropoutCode0
AIXIjs: A Software Demo for General Reinforcement LearningCode0
Guide Actor-Critic for Continuous ControlCode0
Ask the Right Questions: Active Question Reformulation with Reinforcement LearningCode0
Experience enrichment based task independent reward model0
Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning0
Shallow Updates for Deep Reinforcement Learning0
Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning0
Batch Reinforcement Learning on the Industrial Benchmark: First Experiences0
Atari games and Intel processors0
A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling0
Posterior sampling for reinforcement learning: worst-case regret bounds0
Delving into adversarial attacks on deep policies0
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement LearningCode0
ParlAI: A Dialog Research Software PlatformCode1
Automatic Goal Generation for Reinforcement Learning AgentsCode0
New Reinforcement Learning Using a Chaotic Neural Network for Emergence of "Thinking" - "Exploration" Grows into "Thinking" through Learning -0
Emotion in Reinforcement Learning Agents and Robots: A Survey0
Repeated Inverse Reinforcement Learning0
Efficient Parallel Methods for Deep Reinforcement LearningCode0
A Deep Reinforced Model for Abstractive SummarizationCode1
Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space -- Fundamental Theory and MethodsCode0
Show:102550
← PrevPage 289 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified