SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1375113800 of 15113 papers

TitleStatusHype
Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting0
Efficient Entropy for Policy Gradient with Multidimensional Action Space0
Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition0
Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning0
Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling0
Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning0
Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient0
Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing0
A Reinforcement Learning Approach to Age of Information in Multi-User Networks0
Deep Reinforcement Learning of Region Proposal Networks for Object DetectionCode0
Environment Upgrade Reinforcement Learning for Non-Differentiable Multi-Stage Pipelines0
GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning0
Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning0
SeedNet: Automatic Seed Generation With Deep Reinforcement Learning for Robust Interactive Segmentation0
Mining Evidences for Concept Stock Recommendation0
Quality Signals in Generated Stories0
Sequential Attacks on Agents for Long-Term Adversarial Goals0
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image TranslationCode0
Reinforced Continual LearningCode0
Sample-Efficient Deep Reinforcement Learning via Episodic Backward UpdateCode0
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning0
Evaluating Reinforcement Learning Algorithms in Observational Health Settings0
Adversarial Learning of Task-Oriented Neural Dialog Models0
Bayesian Inference with Anchored Ensembles of Neural Networks, and Application to Exploration in Reinforcement LearningCode0
Depth and nonlinearity induce implicit exploration for RL0
Observe and Look Further: Achieving Consistent Performance on Atari0
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition0
Supervised Policy Update for Deep Reinforcement LearningCode0
Virtuously Safe Reinforcement Learning0
Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning0
Value Propagation Networks0
Memory Augmented Self-PlayCode0
Hierarchical clustering with deep Q-learning0
Importance Weighted Transfer of Samples in Reinforcement Learning0
Fingerprint Policy Optimisation for Robust Reinforcement Learning0
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement LearningCode0
Fast Policy Learning through Imitation and Reinforcement0
Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation0
Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces0
Detecting Deceptive Reviews using Generative Adversarial Networks0
A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions0
Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement LearningCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
Reinforced Extractive Summarization with Question-Focused Rewards0
Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological RewardsCode0
Resource Allocation for a Wireless Coexistence Management System Based on Reinforcement Learning0
Meta-Gradient Reinforcement LearningCode0
Robust Distant Supervision Relation Extraction via Deep Reinforcement LearningCode0
A0C: Alpha Zero in Continuous Action SpaceCode0
Intelligent Trainer for Model-Based Reinforcement LearningCode0
Show:102550
← PrevPage 276 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified