SOTAVerified

Offline RL

Papers

Showing 125 of 755 papers

TitleStatusHype
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning0
Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMsCode0
Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning0
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning0
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL0
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using SparsityCode0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards0
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under UncertaintyCode0
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning0
Policy-Based Trajectory Clustering in Offline Reinforcement Learning0
MOBODY: Model Based Off-Dynamics Offline Reinforcement LearningCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
Semi-gradient DICE for Offline Constrained Reinforcement Learning0
How to Provably Improve Return Conditioned Supervised Learning?0
Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation0
Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning0
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning0
Enhanced DACER Algorithm with High Diffusion Efficiency0
Diffusion Guidance Is a Controllable Policy Improvement OperatorCode2
SOReL and TOReL: Two Methods for Fully Offline Reinforcement LearningCode0
Scaling Offline RL via Efficient and Expressive Shortcut Models0
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RLCode0
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning0
Show:102550
← PrevPage 1 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified