SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1490114950 of 15113 papers

TitleStatusHype
Multi-agent Reinforcement Learning with Sparse Interactions by Negotiation and Knowledge Transfer0
Distributed Deep Q-Learning0
Action-Conditional Video Prediction using Deep Networks in Atari GamesCode0
A Reinforcement Learning Approach to Online Learning of Decision Trees0
Reinforcement Learning for the Unit Commitment Problem0
Maximum Entropy Deep Inverse Reinforcement LearningCode0
Massively Parallel Methods for Deep Reinforcement LearningCode0
On the Computability of Solomonoff Induction and Knowledge-Seeking0
Experimental analysis of data-driven control for a building heating system0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Online Transfer Learning in Reinforcement Learning Domains0
Bootstrapped Thompson Sampling and Deep Exploration0
Language Understanding for Text-based Games Using Deep Reinforcement LearningCode0
Bootstrapping Skills0
The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning0
Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control0
A Framework for Constrained and Adaptive Behavior-Based Agents0
Local Nonstationarity for Efficient Bayesian Optimization0
Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret0
A Definition of Happiness for Reinforcement Learning Agents0
Reinforcement Learning applied to Single Neuron0
Learning Where to Sample in Structured PredictionCode0
Context-Aware Mobility Management in HetNets: A Reinforcement Learning Approach0
Optimal Neuron Selection: NK Echo State Networks for Reinforcement Learning0
Reinforcement Learning Neural Turing Machines - RevisedCode0
Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning0
Residential Demand Response Applications Using Batch Reinforcement Learning0
Correct-by-synthesis reinforcement learning with temporal logic constraints0
Human level control through deep reinforcement learningCode0
Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP0
Gaussian Processes for Data-Efficient Learning in Robotics and ControlCode0
Efficient model-based reinforcement learning for approximate online optimal0
From Pixels to Torques: Policy Learning with Deep Dynamical Models0
Multiple Object Recognition with Visual AttentionCode0
Regression with Linear Factored Functions0
Grounding Hierarchical Reinforcement Learning Models for Knowledge Transfer0
Reinforcement Learning and Nonparametric Detection of Game-Theoretic Equilibrium Play in Social Networks0
Sparse Multi-Task Reinforcement Learning0
RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning0
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning0
Difference of Convex Functions Programming for Reinforcement Learning0
Design Principles of the Hippocampal Cognitive Map0
How hard is my MDP?" The distribution-norm to the rescue"0
Multiple Instance Reinforcement Learning for Efficient Weakly-Supervised Detection in Images0
Compress and Control0
A Comparison of learning algorithms on the Arcade Learning Environment0
Do Artificial Reinforcement-Learning Agents Matter Morally?0
Domain-Independent Optimistic Initialization for Reinforcement Learning0
Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation0
Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning0
Show:102550
← PrevPage 299 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified