SOTAVerified

Offline RL

Papers

Showing 151200 of 755 papers

TitleStatusHype
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement LearningCode1
Decision Transformer: Reinforcement Learning via Sequence ModelingCode1
Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse ShapesCode1
Leveraging Demonstrations with Latent Space PriorsCode1
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree SearchCode1
An Optimistic Perspective on Offline Deep Reinforcement LearningCode1
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced DatasetsCode1
LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based PlanningCode1
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory WeightingCode1
Model-Bellman Inconsistency for Model-based Offline Reinforcement LearningCode1
Deployment-Efficient Reinforcement Learning via Model-Based Offline OptimizationCode1
MOPO: Model-based Offline Policy OptimizationCode1
NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic ScenariosCode1
Neural Laplace Control for Continuous-time Delayed SystemsCode1
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RLCode1
Offline Meta-Reinforcement Learning with Advantage WeightingCode1
CROP: Conservative Reward for Model-based Offline Policy OptimizationCode1
Critic Regularized RegressionCode1
Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 DiabetesCode1
Offline Reinforcement Learning for Visual NavigationCode1
Offline Reinforcement Learning with Implicit Q-LearningCode1
Offline Reinforcement Learning with In-sample Q-LearningCode1
Behavior Proximal Policy OptimizationCode1
Offline Reinforcement Learning with Reverse Model-based ImaginationCode1
Critic-Guided Decision Transformer for Offline Reinforcement LearningCode1
Are Expressive Models Truly Necessary for Offline RL?Code1
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement LearningCode1
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
Online and Offline Reinforcement Learning by Planning with a Learned ModelCode1
Online reinforcement learning with sparse rewards through an active inference capsuleCode1
Federated Ensemble-Directed Offline Reinforcement LearningCode1
Direct Preference-based Policy Optimization without Reward ModelingCode1
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement LearningCode1
Discriminator-Weighted Offline Imitation Learning from Suboptimal DemonstrationsCode1
When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement LearningCode1
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-TuningCode1
cosFormer: Rethinking Softmax in AttentionCode1
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-PerformerCode1
When should we prefer Decision Transformers for Offline Reinforcement Learning?Code1
Policy Regularization with Dataset Constraint for Offline Reinforcement LearningCode1
Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward ShapingCode1
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction EstimationCode1
Extreme Q-Learning: MaxEnt RL without EntropyCode1
Adversarially Trained Actor Critic for Offline Reinforcement LearningCode1
MoCoDA: Model-based Counterfactual Data AugmentationCode1
Doubly Mild Generalization for Offline Reinforcement LearningCode1
Reinformer: Max-Return Sequence Modeling for Offline RLCode1
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement LearningCode1
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning0
Contrastive Value Learning: Implicit Models for Simple Offline RL0
Show:102550
← PrevPage 4 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified