SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1130111350 of 15113 papers

TitleStatusHype
Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences0
Variational Imitation Learning with Diverse-quality DemonstrationsCode1
Responsive Safety in Reinforcement Learning0
OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning0
The Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits0
Reinforcement Learning with Differential Privacy0
SVQN: Sequential Variational Soft Q-Learning Networks0
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement LearningCode0
Meta Reinforcement Learning with Autonomous Inference of Subtask DependenciesCode1
Reinforcement Learning with Goal-Distance Gradient0
Deep Reinforcement Learning with Implicit Human Feedback0
Learning Representations in Reinforcement Learning: an Information Bottleneck Approach0
Deep Randomized Least Squares Value Iteration0
Improving the Generalization of Visual Navigation Policies using Invariance Regularization0
Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog0
Information Theoretic Model Predictive Q-Learning0
The Gambler's Problem and Beyond0
Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning0
Reward-Conditioned PoliciesCode0
PAC Confidence Sets for Deep Neural Networks via Calibrated PredictionCode1
World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces0
Deep Reinforced Self-Attention Masks for Abstractive Summarization (DR.SAS)0
A New Framework for Query Efficient Active Imitation Learning0
Computational model discovery with reinforcement learning0
Augmented Replay Memory in Reinforcement Learning With Continuous Control0
Individual specialization in multi-task environments with multiagent reinforcement learners0
Real-time Policy Distillation in Deep Reinforcement Learning0
Speeding up reinforcement learning by combining attention and agency features0
Weak Supervision for Fake News Detection via Reinforcement LearningCode0
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement LearningCode0
Evolution Strategies Converges to Finite Differences0
Deep reinforcement learning for complex evaluation of one-loop diagrams in quantum field theory0
Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach0
Quantum Logic Gate Synthesis as a Markov Decision Process0
Quasi-Newton Trust Region Policy Optimization0
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning0
Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature AttributionCode0
A Survey of Deep Reinforcement Learning in Video Games0
Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time0
Direct and indirect reinforcement learning0
Discrete and Continuous Action Representation for Practical RL in Video GamesCode0
Learning to Navigate Using Mid-Level Visual PriorsCode0
Monte-Carlo Tree Search for Policy Optimization0
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement LearningCode0
Towards Practical Multi-Object Manipulation using Relational Reinforcement LearningCode0
Variational Recurrent Models for Solving Partially Observable Control TasksCode0
Energy-Aware Multi-Server Mobile Edge Computing: A Deep Reinforcement Learning Approach0
Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards0
Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes0
Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement LearningCode0
Show:102550
← PrevPage 227 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified