SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1475114800 of 15113 papers

TitleStatusHype
A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks0
On Reward Function for Survival0
Deep Reinforcement Learning Discovers Internal Models0
Successor Features for Transfer in Reinforcement Learning0
Natural Language Generation as Planning under Uncertainty Using Reinforcement Learning0
Deep Reinforcement Learning With Macro-Actions0
Progressive Neural NetworksCode1
Model-Free Episodic ControlCode0
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit ThreadsCode0
Generative Adversarial Imitation LearningCode1
Policy Networks with Two-Stage Training for Dialogue Systems0
Face valuing: Training user interfaces with facial expressions and reinforcement learning0
Cooperative Inverse Reinforcement LearningCode0
Deep Successor Reinforcement LearningCode0
Continuously Learning Neural Dialogue Management0
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement LearningCode0
Safe and Efficient Off-Policy Reinforcement LearningCode0
Adapting Sampling Interval of Sensor Networks Using On-Line Reinforcement Learning0
Learning to Optimize0
Unifying Count-Based Exploration and Intrinsic MotivationCode0
OpenAI GymCode1
Deep Reinforcement Learning for Dialogue GenerationCode0
Deep Q-Networks for Accelerating the Training of Deep Neural Networks0
End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning0
Reinforcement Learning for Semantic Segmentation in Indoor Scenes0
Difference of Convex Functions Programming Applied to Control with Expert Data0
Death and Suicide in Universal Artificial Intelligence0
Reinforcement Learning for Visual Object Detection0
Information Theoretically Aided Reinforcement Learning for Embodied Agents0
VIME: Variational Information Maximizing ExplorationCode0
Control of Memory, Active Perception, and Action in Minecraft0
Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL AgentCode0
Model-Free Imitation Learning with Policy Optimization0
A PAC RL Algorithm for Episodic POMDPs0
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural NetworksCode0
Learning to Communicate with Deep Multi-Agent Reinforcement LearningCode0
Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition0
Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering0
A Reinforcement Learning System to Encourage Physical Activity in Diabetes Patients0
Optimizing human-interpretable dialog management policy using Genetic Algorithm0
Avoiding Wireheading with Value Reinforcement Learning0
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement LearningCode0
Classifying Options for Deep Reinforcement Learning0
Tournament selection in zeroth-level classifier systems based on average reward reinforcement learning0
Using Reinforcement Learning to Validate Empirical Game-Theoretic Analysis: A Continuous Double Auction Study0
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic MotivationCode0
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics0
Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer0
A statistical learning strategy for closed-loop control of fluid flows0
Show:102550
← PrevPage 296 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified