SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 67766800 of 15113 papers

TitleStatusHype
Neural Improvement Heuristics for Graph Combinatorial Optimization ProblemsCode0
The Phenomenon of Policy Churn0
DM^2: Decentralized Multi-Agent Reinforcement Learning for Distribution MatchingCode0
Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning0
Byzantine-Robust Online and Offline Distributed Reinforcement Learning0
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL0
A Database of Multimodal Data to Construct a Simulated Dialogue Partner with Varying Degrees of Cognitive Health0
A Meta Reinforcement Learning Approach for Predictive Autoscaling in the CloudCode0
A Mixture-of-Expert Approach to RL-based Dialogue Management0
Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning0
Graph Backup: Data Efficient Backup Exploiting Markovian TransitionsCode0
k-Means Maximum Entropy Exploration0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning0
Multi-Agent Learning of Numerical Methods for Hyperbolic PDEs with Factored Dec-MDP0
Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints0
Robust Longitudinal Control for Vehicular Autonomous Platoons Using Deep Reinforcement Learning0
Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems0
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game0
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance LearningCode0
Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning0
Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets0
Reinforcement Learning with a TerminatorCode0
SEREN: Knowing When to Explore and When to Exploit0
Stock Trading Optimization through Model-based Reinforcement Learning with Resistance Support Relative Strength0
Show:102550
← PrevPage 272 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified