SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 66516675 of 15113 papers

TitleStatusHype
AnyMorph: Learning Transferable Polices By Inferring Agent Morphology0
Generalised Policy Improvement with Geometric Policy Composition0
Logic-based Reward Shaping for Multi-Agent Reinforcement LearningCode0
SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe Autonomous Driving0
The State of Sparse Training in Deep Reinforcement LearningCode0
A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings0
Reinforcement Learning-enhanced Shared-account Cross-domain Sequential RecommendationCode0
Reinforcement Learning for Economic Policy: A New Frontier?0
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based ImaginationCode0
Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches0
Autonomous Platoon Control with Integrated Deep Reinforcement Learning and Dynamic Programming0
Automating the resolution of flight conflicts: Deep reinforcement learning in service of air traffic controllers0
Contrastive Learning as Goal-Conditioned Reinforcement Learning0
Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective0
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning0
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning0
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement LearningCode0
Open-Ended Learning Strategies for Learning Complex Locomotion Skills0
Solving the capacitated vehicle routing problem with timing windows using rollouts and MAX-SAT0
Robust Reinforcement Learning with Distributional Risk-averse formulation0
Towards a Solution to Bongard Problems: A Causal Approach0
Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization0
Visual Radial Basis Q-Network0
Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning0
Universally Expressive Communication in Multi-Agent Reinforcement LearningCode0
Show:102550
← PrevPage 267 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified