SOTAVerified

Offline RL

Papers

Showing 601650 of 755 papers

TitleStatusHype
Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters0
Yes, Q-learning Helps Offline In-Context RL0
You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments0
You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL0
Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization0
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators0
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes0
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning0
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning0
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage0
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning0
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity0
2vec: Policy Representations with Successor Features0
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning0
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone0
Policy-Based Trajectory Clustering in Offline Reinforcement Learning0
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning0
Policy Gradients Incorporating the Future0
Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation0
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning0
Preference Elicitation for Offline Reinforcement Learning0
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning0
Preserving Expert-Level Privacy in Offline Reinforcement Learning0
Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning0
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning0
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement0
Prompting Decision Transformer for Few-Shot Policy Generalization0
Provable Benefit of Multitask Representation Learning in Reinforcement Learning0
What can online reinforcement learning with function approximation benefit from general coverage conditions?0
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation0
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward0
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources0
Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL0
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care0
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL0
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning0
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions0
Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning0
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World0
The Smart Buildings Control Suite: A Diverse Open Source Benchmark to Evaluate and Scale HVAC Control Policies for Sustainability0
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning0
Real-World Offline Reinforcement Learning from Vision Language Model Feedback0
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage0
Regularized Behavior Value Estimation0
Reinforced Self-Training (ReST) for Language Modeling0
Reinforcement Learning: An Overview0
Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling0
Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data0
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism0
Reliable validation of Reinforcement Learning Benchmarks0
Show:102550
← PrevPage 13 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified