SOTAVerified

Offline RL

Papers

Showing 301350 of 755 papers

TitleStatusHype
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLCode0
On Practical Reinforcement Learning: Provable Robustness, Scalability, and Statistical EfficiencyCode0
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning0
FOSP: Fine-tuning Offline Safe Policy through World Models0
Contrastive Value Learning: Implicit Models for Simple Offline RL0
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning0
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning0
Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback0
Contrastive Learning as Goal-Conditioned Reinforcement Learning0
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning0
Finetuning Offline World Models in the Real World0
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions0
Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning0
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting0
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching0
BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion0
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices0
Federated Offline Reinforcement Learning0
Contextual Transformer for Offline Meta Reinforcement Learning0
Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL0
Context-Former: Stitching via Latent Conditioned Sequence Modeling0
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models0
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting0
A Tractable Inference Perspective of Offline RL0
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study0
Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations0
Constraints Penalized Q-learning for Safe Offline Reinforcement Learning0
Exclusively Penalized Q-learning for Offline Reinforcement Learning0
Evaluation-Time Policy Switching for Offline Reinforcement Learning0
Evaluation of Active Feature Acquisition Methods for Static Feature Settings0
Equivariant Offline Reinforcement Learning0
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning0
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning0
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning0
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning0
Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation0
Align Your Intents: Offline Imitation Learning via Optimal Transport0
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning0
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles0
Confidence-Conditioned Value Functions for Offline Reinforcement Learning0
Enhancing Reinforcement Learning Through Guided Search0
Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits0
A Validation Tool for Designing Reinforcement Learning Environments0
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective0
Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention0
Enhanced DACER Algorithm with High Diffusion Efficiency0
Energy-Weighted Flow Matching for Offline Reinforcement Learning0
Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning0
Automatic Trade-off Adaptation in Offline RL0
End-to-end Offline Reinforcement Learning for Glycemia Control0
Show:102550
← PrevPage 7 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified