SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1150111550 of 15113 papers

TitleStatusHype
Deep Reinforcement Learning with Smooth Policy0
Improving the Generalization of Visual Navigation Policies using Invariance Regularization0
Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning0
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement LearningCode0
Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation0
Deep Reinforcement Learning with Implicit Human Feedback0
OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning0
Reinforcement Learning with Goal-Distance Gradient0
Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences0
“Other-Play” for Zero-Shot Coordination0
Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog0
Reinforcement Learning with Differential Privacy0
Responsive Safety in Reinforcement Learning0
The Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits0
SVQN: Sequential Variational Soft Q-Learning Networks0
Reward-Conditioned PoliciesCode0
Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning0
The Gambler's Problem and Beyond0
Information Theoretic Model Predictive Q-Learning0
A New Framework for Query Efficient Active Imitation Learning0
Deep Reinforced Self-Attention Masks for Abstractive Summarization (DR.SAS)0
World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces0
Speeding up reinforcement learning by combining attention and agency features0
Real-time Policy Distillation in Deep Reinforcement Learning0
Augmented Replay Memory in Reinforcement Learning With Continuous Control0
Individual specialization in multi-task environments with multiagent reinforcement learners0
Computational model discovery with reinforcement learning0
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement LearningCode0
Weak Supervision for Fake News Detection via Reinforcement LearningCode0
Quantum Logic Gate Synthesis as a Markov Decision Process0
Evolution Strategies Converges to Finite Differences0
Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach0
Deep reinforcement learning for complex evaluation of one-loop diagrams in quantum field theory0
Quasi-Newton Trust Region Policy Optimization0
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning0
Learning to Navigate Using Mid-Level Visual PriorsCode0
A Survey of Deep Reinforcement Learning in Video Games0
Discrete and Continuous Action Representation for Practical RL in Video GamesCode0
Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time0
Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature AttributionCode0
Direct and indirect reinforcement learning0
Variational Recurrent Models for Solving Partially Observable Control TasksCode0
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement LearningCode0
Towards Practical Multi-Object Manipulation using Relational Reinforcement LearningCode0
Monte-Carlo Tree Search for Policy Optimization0
Energy-Aware Multi-Server Mobile Edge Computing: A Deep Reinforcement Learning Approach0
Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement LearningCode0
Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards0
Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes0
Teaching robots to perceive time -- A reinforcement learning approach (Extended version)0
Show:102550
← PrevPage 231 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified