SOTAVerified

Offline RL

Papers

Showing 226250 of 755 papers

TitleStatusHype
Uncertainty-aware Distributional Offline Reinforcement Learning0
Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling0
The Value of Reward Lookahead in Reinforcement Learning0
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning0
Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning0
Why Online Reinforcement Learning is Causal0
Offline Fictitious Self-Play for Competitive Games0
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward EncodingsCode2
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding0
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic SpacesCode0
Align Your Intents: Offline Imitation Learning via Optimal Transport0
Offline Multi-task Transfer RL with Representational Penalization0
Learning Goal-Conditioned Policies from Sub-Optimal Offline Data via Metric Learning0
Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning0
Measurement Scheduling for ICU Patients with Offline Reinforcement Learning0
Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RLCode1
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices0
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning0
Offline Actor-Critic Reinforcement Learning Scales to Large Models0
A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs0
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement LearningCode1
SEABO: A Simple Search-Based Method for Offline Imitation LearningCode1
Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning0
Show:102550
← PrevPage 10 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified