SOTAVerified

Offline RL

Papers

Showing 351400 of 755 papers

TitleStatusHype
Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning0
Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive RecommendationCode1
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning0
Offline Reinforcement Learning with Imbalanced Datasets0
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning0
Model-Bellman Inconsistency for Model-based Offline Reinforcement LearningCode1
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning0
Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization0
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer0
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching0
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data0
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning0
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory WeightingCode1
Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement LearningCode1
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap0
2vec: Policy Representations with Successor Features0
Automatic Trade-off Adaptation in Offline RL0
Semi-Offline Reinforcement Learning for Optimized Text GenerationCode0
Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization0
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
Unified Off-Policy Learning to Rank: a Reinforcement Learning PerspectiveCode0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning0
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles0
Policy Regularization with Dataset Constraint for Offline Reinforcement LearningCode1
Iteratively Refined Behavior Regularization for Offline Reinforcement Learning0
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning0
Decoupled Prioritized Resampling for Offline RLCode1
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RLCode1
Mildly Constrained Evaluation Policy for Offline Reinforcement LearningCode0
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation0
State Regularized Policy Optimization on Data with Dynamics Shift0
Survival Instinct in Offline Reinforcement Learning0
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding0
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
Improving Offline RL by Blending Heuristics0
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control0
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning0
Efficient Diffusion Policies for Offline Reinforcement LearningCode1
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal RepresentationCode1
What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?Code0
Robust Reinforcement Learning Objectives for Sequential Recommender SystemsCode0
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism0
MADiff: Offline Multi-agent Learning with Diffusion ModelsCode1
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement LearningCode0
Beyond Reward: Offline Preference-guided Policy OptimizationCode0
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement LearningCode1
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsCode1
When should we prefer Decision Transformers for Offline Reinforcement Learning?Code1
Show:102550
← PrevPage 8 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified