SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1265112700 of 15113 papers

TitleStatusHype
Inverse Reinforcement Learning in Contextual MDPsCode0
From semantics to execution: Integrating action planning with reinforcement learning for robotic causal problem-solving0
Unknown mixing times in apprenticeship and reinforcement learning0
Scene Induced Multi-Modal Trajectory Forecasting via Planning0
Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment0
Estimating Risk and Uncertainty in Deep Reinforcement LearningCode0
Hierarchical Reinforcement Learning for Quadruped Locomotion0
Deep Reinforcement Learning for Detecting Malicious Websites0
COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven ExplorationCode0
Stochastic Inverse Reinforcement Learning0
Perceptual Values from Observation0
Reinforcement Learning without Ground-Truth State0
Stochastic Variance Reduction for Deep Q-learning0
A Bayesian Approach to Robust Reinforcement Learning0
Deep Reinforcement Learning Based Parameter Control in Differential EvolutionCode0
Issues concerning realizability of Blackwell optimal policies in reinforcement learning0
Reinforcement Learning for Learning of Dynamical Systems in Uncertain Environment: a Tutorial0
Evolving Rewards to Automate Reinforcement Learning0
A Regularized Opponent Model with Maximum Entropy ObjectiveCode0
Deep Reinforcement Learning-Based Channel Allocation for Wireless LANs with Graph Convolutional Networks0
Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning0
Exact-K Recommendation via Maximal Clique OptimizationCode0
In Support of Over-Parametrization in Deep Reinforcement Learning: an Empirical Study0
Mastering the Game of Sungka from Random PlayCode0
MaMiC: Macro and Micro Curriculum for Robotic Reinforcement Learning0
Stochastically Dominant Distributional Reinforcement Learning0
Stratospheric Aerosol Injection as a Deep Reinforcement Learning Problem0
TBQ(σ): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning0
Meta Reinforcement Learning with Task Embedding and Shared PolicyCode0
Meta-Reinforcement Learning for Adaptive Autonomous Driving0
Random Expert Distillation: Imitation Learning via Expert Policy Support EstimationCode0
QBSO-FS: A Reinforcement Learning Based Bee Swarm Optimization Metaheuristic for Feature SelectionCode0
Sub-policy Adaptation for Hierarchical Reinforcement Learning0
Learning Exploration Policies for Model-Agnostic Meta-Reinforcement Learning0
Knowledge-Based Sequential Decision-Making Under Uncertainty0
Goal-conditioned Imitation Learning0
Leveraging exploration in off-policy algorithms via normalizing flowsCode0
Deep Knowledge Based Agent: Learning to do tasks by self-thinking about imaginary worlds0
Autonomous Penetration Testing using Reinforcement Learning0
Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment0
Deep reinforcement learning for scheduling in large-scale networked control systems0
Deep Reinforcement Learning for Scheduling in Cellular Networks0
Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic FunctionsCode0
A Learning based Branch and Bound for Maximum Common Subgraph Problems0
Meta reinforcement learning as task inferenceCode0
Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction0
Variational Regret Bounds for Reinforcement Learning0
Trajectory-Based Off-Policy Deep Reinforcement LearningCode0
TauRieL: Targeting Traveling Salesman Problem with a deep reinforcement learning inspired architecture0
Successor Options: An Option Discovery Framework for Reinforcement LearningCode0
Show:102550
← PrevPage 254 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified