SOTAVerified

Offline RL

Papers

Showing 426450 of 755 papers

TitleStatusHype
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments0
Uncertainty-aware Distributional Offline Reinforcement Learning0
Uncertainty Regularized Policy Learning for Offline Reinforcement Learning0
Uncertainty Weighted Offline Reinforcement Learning0
Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents0
Unsupervised-to-Online Reinforcement Learning0
Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing0
User-Interactive Offline Reinforcement Learning0
Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning0
Value Penalized Q-Learning for Recommender Systems0
Variational oracle guiding for reinforcement learning0
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach0
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning0
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning0
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap0
What are the Statistical Limits of Offline RL with Linear Function Approximation?0
What Matters for Batch Online Reinforcement Learning in Robotics?0
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?0
Which Features are Best for Successor Features?0
Why Online Reinforcement Learning is Causal0
Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters.0
Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters0
Yes, Q-learning Helps Offline In-Context RL0
Show:102550
← PrevPage 18 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified