SOTAVerified

Offline RL

Papers

Showing 551600 of 755 papers

TitleStatusHype
Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task0
Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning0
Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning0
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning0
Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning0
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity0
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation0
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches0
Domain Adaptation for Offline Reinforcement Learning with Limited Samples0
Domain Generalization for Robust Model-Based Offline Reinforcement Learning0
DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning0
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage0
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization0
DRDT3: Diffusion-Refined Decision Test-Time Training Model0
Dual Generator Offline Reinforcement Learning0
Efficient Imitation Learning with Conservative World Models0
Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only0
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL0
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL0
Enabling A Network AI Gym for Autonomous Cyber Agents0
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient0
End-to-end Offline Reinforcement Learning for Glycemia Control0
Energy-Weighted Flow Matching for Offline Reinforcement Learning0
Enhanced DACER Algorithm with High Diffusion Efficiency0
Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention0
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective0
Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits0
Enhancing Reinforcement Learning Through Guided Search0
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles0
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning0
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning0
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning0
Equivariant Offline Reinforcement Learning0
Evaluation of Active Feature Acquisition Methods for Static Feature Settings0
Evaluation-Time Policy Switching for Offline Reinforcement Learning0
Exclusively Penalized Q-learning for Offline Reinforcement Learning0
Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations0
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study0
A Tractable Inference Perspective of Offline RL0
Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL0
Federated Offline Reinforcement Learning0
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices0
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching0
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting0
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions0
Finetuning Offline World Models in the Real World0
Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback0
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning0
FOSP: Fine-tuning Offline Safe Policy through World Models0
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning0
Show:102550
← PrevPage 12 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified