SOTAVerified

Offline RL

Papers

Showing 150 of 755 papers

TitleStatusHype
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning0
Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMsCode0
Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning0
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL0
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning0
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using SparsityCode0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards0
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under UncertaintyCode0
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning0
Policy-Based Trajectory Clustering in Offline Reinforcement Learning0
Semi-gradient DICE for Offline Constrained Reinforcement Learning0
MOBODY: Model Based Off-Dynamics Offline Reinforcement LearningCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
How to Provably Improve Return Conditioned Supervised Learning?0
Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation0
Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning0
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning0
Enhanced DACER Algorithm with High Diffusion Efficiency0
Diffusion Guidance Is a Controllable Policy Improvement OperatorCode2
SOReL and TOReL: Two Methods for Fully Offline Reinforcement LearningCode0
Scaling Offline RL via Efficient and Expressive Shortcut Models0
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RLCode0
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning0
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning0
PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning ProjectsCode0
Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only0
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
Think-J: Learning to Think for Generative LLM-as-a-JudgeCode0
Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization0
Prior-Guided Diffusion Planning for Offline Reinforcement Learning0
ImagineBench: Evaluating Reinforcement Learning with Large Language Model RolloutsCode1
Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data0
Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL0
What Matters for Batch Online Reinforcement Learning in Robotics?0
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains0
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach0
Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning0
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach0
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study0
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning0
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator0
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning0
A Clean Slate for Offline Reinforcement LearningCode3
Towards Optimal Differentially Private Regret Bounds in Linear MDPs0
Decision SpikeFormer: Spike-Driven Transformer for Decision Making0
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation0
Offline Reinforcement Learning with Discrete Diffusion Skills0
Show:102550
← PrevPage 1 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified