SOTAVerified

Offline RL

Papers

Showing 201225 of 755 papers

TitleStatusHype
Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning0
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL0
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning0
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using SparsityCode0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards0
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under UncertaintyCode0
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning0
Semi-gradient DICE for Offline Constrained Reinforcement Learning0
Policy-Based Trajectory Clustering in Offline Reinforcement Learning0
MOBODY: Model Based Off-Dynamics Offline Reinforcement LearningCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
How to Provably Improve Return Conditioned Supervised Learning?0
Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation0
Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning0
Enhanced DACER Algorithm with High Diffusion Efficiency0
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning0
SOReL and TOReL: Two Methods for Fully Offline Reinforcement LearningCode0
Scaling Offline RL via Efficient and Expressive Shortcut Models0
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RLCode0
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning0
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning0
Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only0
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies0
Show:102550
← PrevPage 9 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified