SOTAVerified

Offline RL

Papers

Showing 451500 of 755 papers

TitleStatusHype
Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments0
Bi-Level Offline Policy Optimization with Limited Exploration0
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning0
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement LearningCode0
Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration0
Self-Confirming Transformer for Belief-Conditioned Adaptation in Offline Multi-Agent Reinforcement Learning0
Learning to Reach Goals via DiffusionCode0
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning0
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and SmoothnessCode0
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments0
Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills0
Robotic Offline RL from Internet Videos via Value-Function Pre-Training0
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps0
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions0
DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning0
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning0
Model-based Offline Policy Optimization with Adversarial NetworkCode0
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance0
Multi-Objective Decision Transformers for Offline Reinforcement Learning0
Reinforced Self-Training (ReST) for Language Modeling0
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World0
Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations0
Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation0
Contrastive Example-Based ControlCode0
A Connection between One-Step Regularization and Critic Regularization in Reinforcement LearningCode0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
Model-based Offline Reinforcement Learning with Count-based ConservatismCode0
PASTA: Pretrained Action-State Transformer Agents0
Budgeting Counterfactual for Offline RL0
Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning0
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning0
Offline Reinforcement Learning with Imbalanced Datasets0
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning0
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning0
Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization0
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer0
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching0
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data0
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning0
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap0
Automatic Trade-off Adaptation in Offline RL0
Semi-Offline Reinforcement Learning for Optimized Text GenerationCode0
2vec: Policy Representations with Successor Features0
Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization0
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning0
Unified Off-Policy Learning to Rank: a Reinforcement Learning PerspectiveCode0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles0
Show:102550
← PrevPage 10 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified