SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 55015550 of 15113 papers

TitleStatusHype
Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning0
KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed Stability in Nonlinear Dynamical Systems0
Reinforcement Learning with Neural Radiance Fields0
Offline Reinforcement Learning with Causal Structured World Models0
Joint Energy Dispatch and Unit Commitment in Microgrids Based on Deep Reinforcement Learning0
Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate ProgressCode1
Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards0
HEX: Human-in-the-loop Explainability via Deep Reinforcement Learning0
Equivariant Reinforcement Learning for Quadrotor UAV0
RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning0
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning0
Sample-Efficient Reinforcement Learning of Partially Observable Markov Games0
Offline Reinforcement Learning with Differential Privacy0
When does return-conditioned supervised learning work for offline reinforcement learning?Code1
Reinforcement learning based parameters adaption method for particle swarm optimization0
NeuralSympCheck: A Symptom Checking and Disease Diagnostic Neural Model with Logic RegularizationCode1
Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes0
Deep Transformer Q-Networks for Partially Observable Reinforcement LearningCode1
Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning0
A Database of Multimodal Data to Construct a Simulated Dialogue Partner with Varying Degrees of Cognitive Health0
RLSS: A Deep Reinforcement Learning Algorithm for Sequential Scene Generation0
ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual ActorCode1
Model Generation with Provable Coverability for Offline Reinforcement Learning0
Neural Improvement Heuristics for Graph Combinatorial Optimization ProblemsCode0
Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus0
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic ForgettingCode1
Predecessor Features0
Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation0
On Gap-dependent Bounds for Offline Reinforcement Learning0
The Phenomenon of Policy Churn0
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL0
DM^2: Decentralized Multi-Agent Reinforcement Learning for Distribution MatchingCode0
Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning0
Byzantine-Robust Online and Offline Distributed Reinforcement Learning0
IGLU Gridworld: Simple and Fast Environment for Embodied Dialog AgentsCode1
A Mixture-of-Expert Approach to RL-based Dialogue Management0
Human-AI Shared Control via Policy DissectionCode2
Robust Longitudinal Control for Vehicular Autonomous Platoons Using Deep Reinforcement Learning0
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints0
Multi-Agent Learning of Numerical Methods for Hyperbolic PDEs with Factored Dec-MDP0
One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning0
Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems0
k-Means Maximum Entropy Exploration0
Graph Backup: Data Efficient Backup Exploiting Markovian TransitionsCode0
Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning0
A Meta Reinforcement Learning Approach for Predictive Autoscaling in the CloudCode0
DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal SystemsCode2
A Simulation Environment and Reinforcement Learning Method for Waste Reduction0
Show:102550
← PrevPage 111 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified