SOTAVerified

Offline RL

Papers

Showing 201250 of 755 papers

TitleStatusHype
Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning0
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning0
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL0
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using SparsityCode0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards0
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under UncertaintyCode0
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
MOBODY: Model Based Off-Dynamics Offline Reinforcement LearningCode0
Policy-Based Trajectory Clustering in Offline Reinforcement Learning0
Semi-gradient DICE for Offline Constrained Reinforcement Learning0
How to Provably Improve Return Conditioned Supervised Learning?0
Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation0
Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning0
Enhanced DACER Algorithm with High Diffusion Efficiency0
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning0
Scaling Offline RL via Efficient and Expressive Shortcut Models0
SOReL and TOReL: Two Methods for Fully Offline Reinforcement LearningCode0
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RLCode0
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning0
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning0
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies0
Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only0
PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning ProjectsCode0
Think-J: Learning to Think for Generative LLM-as-a-JudgeCode0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization0
Prior-Guided Diffusion Planning for Offline Reinforcement Learning0
Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data0
Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL0
What Matters for Batch Online Reinforcement Learning in Robotics?0
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains0
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach0
Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning0
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach0
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study0
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning0
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator0
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning0
Towards Optimal Differentially Private Regret Bounds in Linear MDPs0
Decision SpikeFormer: Spike-Driven Transformer for Decision Making0
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation0
Offline Reinforcement Learning with Discrete Diffusion Skills0
Behaviour Discovery and Attribution for Explainable Reinforcement Learning0
Evaluation-Time Policy Switching for Offline Reinforcement Learning0
The Pitfalls of Imitation Learning when Actions are Continuous0
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning0
Policy Constraint by Only Support Constraint for Offline Reinforcement LearningCode0
Show:102550
← PrevPage 5 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified