SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 34263450 of 15113 papers

TitleStatusHype
Countering Language Drift via Grounding0
A Study of Continual Learning Methods for Q-Learning0
Deep Knowledge Based Agent: Learning to do tasks by self-thinking about imaginary worlds0
A Study of AI Population Dynamics with Million-agent Reinforcement Learning0
Counterfactual Regularization for Model-Based Reinforcement Learning0
Deep Learning and Reinforcement Learning for Autonomous Unmanned Aerial Systems: Roadmap for Theory to Deployment0
A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme0
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning0
Deep learning for molecular design - a review of the state of the art0
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning0
Deep Learning in Earthquake Engineering: A Comprehensive Review0
Adaptive Dialog Policy Learning with Hindsight and User Modeling0
Deep Learning Interference Cancellation in Wireless Networks0
Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication0
Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment0
Deep Learning of Koopman Representation for Control0
Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision0
Deep Reinforcement Learning With Adaptive Combined Critics0
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL0
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction0
Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing0
DeepMDP: Learning Continuous Latent Space Models for Representation Learning0
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning0
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing0
Agent-Centric Representations for Multi-Agent Reinforcement Learning0
Show:102550
← PrevPage 138 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified