SOTAVerified

Offline RL

Papers

Showing 201250 of 755 papers

TitleStatusHype
Contrastive Example-Based ControlCode0
ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender SystemsCode0
Robust Offline Reinforcement learning with Heavy-Tailed RewardsCode0
RL Unplugged: A Collection of Benchmarks for Offline Reinforcement LearningCode0
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement LearningCode0
Revisiting Bellman Errors for Offline Model SelectionCode0
Robust Reinforcement Learning Objectives for Sequential Recommender SystemsCode0
Scalable Decision-Making in Stochastic Environments through Learned Temporal AbstractionCode0
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement LearningCode0
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained OptimizationCode0
Experimental evaluation of offline reinforcement learning for HVAC control in buildingsCode0
Explaining RL Decisions with TrajectoriesCode0
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement LearningCode0
Q-Value Weighted Regression: Reinforcement Learning with Limited DataCode0
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function ApproximationCode0
PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning ProjectsCode0
Solving Offline Reinforcement Learning with Decision Tree RegressionCode0
Policy-regularized Offline Multi-objective Reinforcement LearningCode0
POPO: Pessimistic Offline Policy OptimizationCode0
Policy Constraint by Only Support Constraint for Offline Reinforcement LearningCode0
Compositional Conservatism: A Transductive Approach in Offline Reinforcement LearningCode0
Preference-Guided Reflective Sampling for Aligning Language ModelsCode0
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement LearningCode0
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RLCode0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
Active Advantage-Aligned Online Reinforcement Learning with Offline DataCode0
On Practical Reinforcement Learning: Provable Robustness, Scalability, and Statistical EfficiencyCode0
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open ProblemsCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under UncertaintyCode0
MOBODY: Model Based Off-Dynamics Offline Reinforcement LearningCode0
Offline RL With Resource Constrained Online DeploymentCode0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based ImaginationCode0
Offline Equilibrium FindingCode0
Offline Data Enhanced On-Policy Policy Gradient with Provable GuaranteesCode0
Fat-to-Thin Policy Optimization: Offline RL with Sparse PoliciesCode0
Building Persona Consistent Dialogue Agents with Offline Reinforcement LearningCode0
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network SimulationCode0
Multi-Game Decision TransformersCode0
Diffusion Models as Optimizers for Efficient Planning in Offline RLCode0
Continual Task Learning through Adaptive Policy Self-CompositionCode0
Mutual Information Regularized Offline Reinforcement LearningCode0
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement LearningCode0
Offline Reinforcement Learning from Datasets with Structured Non-StationarityCode0
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data CoverageCode0
BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement LearningCode0
Two-step reinforcement learning for model-free redesign of nonlinear optimal regulatorCode0
Model-based Offline Policy Optimization with Adversarial NetworkCode0
Show:102550
← PrevPage 5 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified