SOTAVerified

Offline RL

Papers

Showing 251300 of 755 papers

TitleStatusHype
Diffusion Models as Optimizers for Efficient Planning in Offline RLCode0
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement LearningCode0
Preference-Guided Reflective Sampling for Aligning Language ModelsCode0
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data CoverageCode0
BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement LearningCode0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement LearningCode0
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RLCode0
DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy LearningCode0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
A Connection between One-Step Regularization and Critic Regularization in Reinforcement LearningCode0
Offline RL With Resource Constrained Online DeploymentCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
On Practical Reinforcement Learning: Provable Robustness, Scalability, and Statistical EfficiencyCode0
Solving Offline Reinforcement Learning with Decision Tree RegressionCode0
Beyond Reward: Offline Preference-guided Policy OptimizationCode0
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPsCode0
Decision Transformer under Random Frame DroppingCode0
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLCode0
DCUR: Data Curriculum for Teaching via Samples with Reinforcement LearningCode0
Offline Equilibrium FindingCode0
Offline Data Enhanced On-Policy Policy Gradient with Provable GuaranteesCode0
Offline Reinforcement Learning from Datasets with Structured Non-StationarityCode0
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network SimulationCode0
Multi-Game Decision TransformersCode0
Mutual Information Regularized Offline Reinforcement LearningCode0
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic SpacesCode0
Model-based Offline Reinforcement Learning with Count-based ConservatismCode0
Model-based Offline Policy Optimization with Adversarial NetworkCode0
d3rlpy: An Offline Deep Reinforcement Learning LibraryCode0
Model-Based Offline Planning with Trajectory PruningCode0
Two-step reinforcement learning for model-free redesign of nonlinear optimal regulatorCode0
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman OperatorCode0
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from ObservationsCode0
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics BeliefCode0
Mildly Constrained Evaluation Policy for Offline Reinforcement LearningCode0
A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement LearningCode0
Learning Versatile Skills with Curriculum MaskingCode0
Learning to Reach Goals via DiffusionCode0
Behavior Prior Representation learning for Offline Reinforcement LearningCode0
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RLCode0
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement LearningCode0
Corruption-Robust Offline Reinforcement Learning with General Function ApproximationCode0
Learning from Sparse Offline Datasets via Conservative Density EstimationCode0
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?Code0
Behavior Estimation from Multi-Source Data for Offline Reinforcement LearningCode0
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?Code0
Learning to Control Autonomous Fleets from Observation via Offline Reinforcement LearningCode0
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement LearningCode0
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement LearningCode0
Show:102550
← PrevPage 6 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified