SOTAVerified

Offline RL

Papers

Showing 176200 of 755 papers

TitleStatusHype
Integrating Domain Knowledge for handling Limited Data in Offline RL0
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-PerformerCode1
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?Code0
Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning0
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLCode0
Stabilizing Extreme Q-learning by Maclaurin ExpansionCode0
Strategically Conservative Q-LearningCode1
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models0
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning0
A Fast Convergence Theory for Offline Decision Making0
Causal prompting model-based offline reinforcement learning0
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory0
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning0
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical ReexaminationCode1
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RLCode1
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained OptimizationCode0
Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier0
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators0
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q^π-Realizability and Concentrability0
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement LearningCode2
Q-value Regularized Transformer for Offline Reinforcement LearningCode1
GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement LearningCode1
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy OptimizationCode2
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree SearchCode1
Show:102550
← PrevPage 8 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified