SOTAVerified

Offline RL

Papers

Showing 151200 of 755 papers

TitleStatusHype
Language-Conditioned Offline RL for Multi-Robot Navigation0
A Simulation Benchmark for Autonomous Racing with Large-Scale Human DataCode2
Diffusion Models as Optimizers for Efficient Planning in Offline RLCode0
ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender SystemsCode0
Sparsity-based Safety Conservatism for Constrained Offline Reinforcement Learning0
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning0
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning0
FOSP: Fine-tuning Offline Safe Policy through World Models0
Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling0
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning0
Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators0
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning0
Preference Elicitation for Offline Reinforcement Learning0
Equivariant Offline Reinforcement Learning0
Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing0
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback0
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation0
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement LearningCode3
Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning0
SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets0
DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning0
A Dual Approach to Imitation Learning from Observations with Offline Datasets0
Is Value Learning Really the Main Bottleneck in Offline RL?Code3
Augmenting Offline RL with Unlabeled Data0
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning0
Integrating Domain Knowledge for handling Limited Data in Offline RL0
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-PerformerCode1
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?Code0
Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning0
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLCode0
Stabilizing Extreme Q-learning by Maclaurin ExpansionCode0
Strategically Conservative Q-LearningCode1
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models0
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning0
A Fast Convergence Theory for Offline Decision Making0
Causal prompting model-based offline reinforcement learning0
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory0
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning0
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical ReexaminationCode1
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RLCode1
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained OptimizationCode0
Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier0
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators0
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q^π-Realizability and Concentrability0
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement LearningCode2
Q-value Regularized Transformer for Offline Reinforcement LearningCode1
GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement LearningCode1
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy OptimizationCode2
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree SearchCode1
Show:102550
← PrevPage 4 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified