SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 66516700 of 15113 papers

TitleStatusHype
AnyMorph: Learning Transferable Polices By Inferring Agent Morphology0
Generalised Policy Improvement with Geometric Policy Composition0
Logic-based Reward Shaping for Multi-Agent Reinforcement LearningCode0
SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe Autonomous Driving0
The State of Sparse Training in Deep Reinforcement LearningCode0
A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings0
Reinforcement Learning-enhanced Shared-account Cross-domain Sequential RecommendationCode0
Reinforcement Learning for Economic Policy: A New Frontier?0
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based ImaginationCode0
Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches0
Autonomous Platoon Control with Integrated Deep Reinforcement Learning and Dynamic Programming0
Automating the resolution of flight conflicts: Deep reinforcement learning in service of air traffic controllers0
Contrastive Learning as Goal-Conditioned Reinforcement Learning0
Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective0
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning0
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning0
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement LearningCode0
Open-Ended Learning Strategies for Learning Complex Locomotion Skills0
Solving the capacitated vehicle routing problem with timing windows using rollouts and MAX-SAT0
Robust Reinforcement Learning with Distributional Risk-averse formulation0
Towards a Solution to Bongard Problems: A Causal Approach0
Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization0
Visual Radial Basis Q-Network0
Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning0
Universally Expressive Communication in Multi-Agent Reinforcement LearningCode0
Defending Observation Attacks in Deep Reinforcement Learning via Detection and DenoisingCode0
FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks0
Deep Reinforcement Learning for Exact Combinatorial Optimization: Learning to Branch0
Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach0
Intrinsically motivated option learning: a comparative study of recent methods0
IGN : Implicit Generative NetworksCode0
Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks0
Provable Benefit of Multitask Representation Learning in Reinforcement Learning0
Relative Policy-Transition Optimization for Fast Policy Transfer0
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward0
RL-GA: A Reinforcement Learning-Based Genetic Algorithm for Electromagnetic Detection Satellite Scheduling Problem0
Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning0
Case-Based Inverse Reinforcement Learning Using Temporal CoherenceCode0
Deep Reinforcement Learning for Optimal Investment and Saving Strategy Selection in Heterogeneous Profiles: Intelligent Agents working towards retirement0
Federated Offline Reinforcement Learning0
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?Code0
An application of neural networks to a problem in knot theory and group theory (untangling braids)0
Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy0
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement LearningCode0
Large-Scale Retrieval for Reinforcement Learning0
Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS0
Multifidelity Reinforcement Learning with Control Variates0
Policy Gradient Reinforcement Learning for Uncertain Polytopic LPV Systems based on MHE-MPC0
Regret Bounds for Information-Directed Reinforcement Learning0
Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information0
Show:102550
← PrevPage 134 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified