SOTAVerified

Offline RL

Papers

Showing 651700 of 755 papers

TitleStatusHype
Representation Balancing Offline Model-based Reinforcement Learning0
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RLCode0
ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender SystemsCode0
Learning to Control Autonomous Fleets from Observation via Offline Reinforcement LearningCode0
Learning from Sparse Offline Datasets via Conservative Density EstimationCode0
S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement LearningCode0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement LearningCode0
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement LearningCode0
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?Code0
On Practical Reinforcement Learning: Provable Robustness, Scalability, and Statistical EfficiencyCode0
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained OptimizationCode0
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
Offline RL With Resource Constrained Online DeploymentCode0
Scalable Decision-Making in Stochastic Environments through Learned Temporal AbstractionCode0
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement LearningCode0
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?Code0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
Policy Constraint by Only Support Constraint for Offline Reinforcement LearningCode0
DCUR: Data Curriculum for Teaching via Samples with Reinforcement LearningCode0
Fat-to-Thin Policy Optimization: Offline RL with Sparse PoliciesCode0
Explaining RL Decisions with TrajectoriesCode0
Experimental evaluation of offline reinforcement learning for HVAC control in buildingsCode0
Offline Reinforcement Learning from Datasets with Structured Non-StationarityCode0
Policy-regularized Offline Multi-objective Reinforcement LearningCode0
POPO: Pessimistic Offline Policy OptimizationCode0
d3rlpy: An Offline Deep Reinforcement Learning LibraryCode0
Preference-Guided Reflective Sampling for Aligning Language ModelsCode0
MOBODY: Model Based Off-Dynamics Offline Reinforcement LearningCode0
Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated EnvironmentsCode0
Offline Equilibrium FindingCode0
A Connection between One-Step Regularization and Critic Regularization in Reinforcement LearningCode0
Offline Data Enhanced On-Policy Policy Gradient with Provable GuaranteesCode0
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network SimulationCode0
The Pump Scheduling Problem: A Real-World Scenario for Reinforcement LearningCode0
Semi-Markov Offline Reinforcement Learning for HealthcareCode0
Semi-Offline Reinforcement Learning for Optimized Text GenerationCode0
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement LearningCode0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under UncertaintyCode0
Building Persona Consistent Dialogue Agents with Offline Reinforcement LearningCode0
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function ApproximationCode0
The Role of Deep Learning Regularizations on Actors in Offline RLCode0
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data CorruptionsCode0
Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement LearningCode0
Optimality Inductive Biases and Agnostic Guidelines for Offline Reinforcement LearningCode0
PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning ProjectsCode0
Mutual Information Regularized Offline Reinforcement LearningCode0
Think-J: Learning to Think for Generative LLM-as-a-JudgeCode0
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data CoverageCode0
Show:102550
← PrevPage 14 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified