SOTAVerified

Offline RL

Papers

Showing 551600 of 755 papers

TitleStatusHype
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning0
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach0
Targeted Environment Design from Offline Data0
The Challenges of Exploration for Offline Reinforcement Learning0
The Essential Elements of Offline RL via Supervised Learning0
The Least Restriction for Offline Reinforcement Learning0
The Pitfalls of Imitation Learning when Actions are Continuous0
The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning0
The reinforcement learning-based multi-agent cooperative approach for the adaptive speed regulation on a metallurgical pickling line0
The Role of Coverage in Online Reinforcement Learning0
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation0
The Value of Reward Lookahead in Reinforcement Learning0
The Virtues of Pessimism in Inverse Reinforcement Learning0
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning0
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers0
Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers0
Towards Generalizable Reinforcement Learning for Trade Execution0
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism0
Towards Optimal Differentially Private Regret Bounds in Linear MDPs0
Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning0
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses0
Tractable Offline Learning of Regular Decision Processes0
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q^π-Realizability and Concentrability0
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding0
Transferred Q-learning0
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning0
UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning0
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments0
Uncertainty-aware Distributional Offline Reinforcement Learning0
Uncertainty Regularized Policy Learning for Offline Reinforcement Learning0
Uncertainty Weighted Offline Reinforcement Learning0
Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents0
Unsupervised-to-Online Reinforcement Learning0
Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing0
User-Interactive Offline Reinforcement Learning0
Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning0
Value Penalized Q-Learning for Recommender Systems0
Variational oracle guiding for reinforcement learning0
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach0
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning0
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning0
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap0
What are the Statistical Limits of Offline RL with Linear Function Approximation?0
What Matters for Batch Online Reinforcement Learning in Robotics?0
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?0
Which Features are Best for Successor Features?0
Why Online Reinforcement Learning is Causal0
Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters.0
Show:102550
← PrevPage 12 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified