SOTAVerified

Offline RL

Papers

Showing 101150 of 755 papers

TitleStatusHype
Dual RL: Unification and New Methods for Reinforcement and Imitation LearningCode1
ImagineBench: Evaluating Reinforcement Learning with Large Language Model RolloutsCode1
DataLight: Offline Data-Driven Traffic Signal ControlCode1
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement LearningCode1
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement LearningCode1
Offline RL Without Off-Policy EvaluationCode1
COMBO: Conservative Offline Model-Based Policy OptimizationCode1
Agent-Controller Representations: Principled Offline RL with Rich Exogenous InformationCode1
Latent-Variable Advantage-Weighted Policy Optimization for Offline RLCode1
Online reinforcement learning with sparse rewards through an active inference capsuleCode1
A Workflow for Offline Model-Free Robotic Reinforcement LearningCode1
Conservative Offline Distributional Reinforcement LearningCode1
Conservative Q-Learning for Offline Reinforcement LearningCode1
Zero-Shot Reinforcement Learning from Low Quality DataCode1
Reliable Conditioning of Behavioral Cloning for Offline Reinforcement LearningCode1
Consistency Models as a Rich and Efficient Policy Class for Reinforcement LearningCode1
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement LearningCode1
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing FlowsCode1
Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive RecommendationCode1
Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in HealthcareCode1
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RLCode1
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RLCode1
MADiff: Offline Multi-agent Learning with Diffusion ModelsCode1
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement LearningCode1
Policy Regularization with Dataset Constraint for Offline Reinforcement LearningCode1
Offline Meta-Reinforcement Learning with Advantage WeightingCode1
Decoupled Prioritized Resampling for Offline RLCode1
NeoRL: A Near Real-World Benchmark for Offline Reinforcement LearningCode1
MOPO: Model-based Offline Policy OptimizationCode1
NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic ScenariosCode1
Behavior Transformers: Cloning k modes with one stoneCode1
Model-Bellman Inconsistency for Model-based Offline Reinforcement LearningCode1
Deployment-Efficient Reinforcement Learning via Model-Based Offline OptimizationCode1
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement LearningCode1
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare SettingsCode1
Neural Laplace Control for Continuous-time Delayed SystemsCode1
CROP: Conservative Reward for Model-based Offline Policy OptimizationCode1
Curriculum Offline Imitation LearningCode1
A Minimalist Approach to Offline Reinforcement LearningCode1
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement LearningCode1
MOReL : Model-Based Offline Reinforcement LearningCode1
Acme: A Research Framework for Distributed Reinforcement LearningCode1
Critic Regularized RegressionCode1
Behavior Proximal Policy OptimizationCode1
Critic-Guided Decision Transformer for Offline Reinforcement LearningCode1
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement LearningCode1
Direct Preference-based Policy Optimization without Reward ModelingCode1
When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement LearningCode1
Guiding Online Reinforcement Learning with Action-Free Offline PretrainingCode1
Show:102550
← PrevPage 3 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified