SOTAVerified

Offline RL

Papers

Showing 251300 of 755 papers

TitleStatusHype
FOSP: Fine-tuning Offline Safe Policy through World Models0
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient0
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning0
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization0
Enabling A Network AI Gym for Autonomous Cyber Agents0
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL0
Augmenting Offline RL with Unlabeled Data0
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL0
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning0
Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only0
A Fast Convergence Theory for Offline Decision Making0
A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning0
Learning to View: Decision Transformers for Active Object Detection0
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer0
Efficient Imitation Learning with Conservative World Models0
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings0
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning0
Dual Generator Offline Reinforcement Learning0
A Survey on Model-based Reinforcement Learning0
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning0
Learning Pseudometric-based Action Representations for Offline Reinforcement Learning0
DRDT3: Diffusion-Refined Decision Test-Time Training Model0
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization0
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning0
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage0
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning0
A Strong Baseline for Batch Imitation Learning0
Causal prompting model-based offline reinforcement learning0
DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning0
Domain Generalization for Robust Model-Based Offline Reinforcement Learning0
Prior-Guided Diffusion Planning for Offline Reinforcement Learning0
Learning Dexterous Manipulation from Suboptimal Experts0
Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning0
Learning Value Functions from Undirected State-only Experience0
Language Decision Transformers with Exponential Tilt for Interactive Text Environments0
Domain Adaptation for Offline Reinforcement Learning with Limited Samples0
Can Offline Reinforcement Learning Help Natural Language Understanding?0
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches0
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation0
Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning0
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity0
Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning0
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains0
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning0
Advancing RAN Slicing with Offline Reinforcement Learning0
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data0
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning0
Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning0
Budgeting Counterfactual for Offline RL0
A Dual Approach to Imitation Learning from Observations with Offline Datasets0
Show:102550
← PrevPage 6 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified