SOTAVerified

Offline RL

Papers

Showing 501550 of 755 papers

TitleStatusHype
Iteratively Refined Behavior Regularization for Offline Reinforcement Learning0
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning0
Mildly Constrained Evaluation Policy for Offline Reinforcement LearningCode0
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation0
State Regularized Policy Optimization on Data with Dynamics Shift0
Survival Instinct in Offline Reinforcement Learning0
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning0
Improving Offline RL by Blending Heuristics0
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding0
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control0
What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?Code0
Robust Reinforcement Learning Objectives for Sequential Recommender SystemsCode0
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism0
Beyond Reward: Offline Preference-guided Policy OptimizationCode0
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement LearningCode0
Offline Primal-Dual Reinforcement Learning for Linear MDPs0
Offline Reinforcement Learning with Additional Covering Distributions0
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models0
SLiC-HF: Sequence Likelihood Calibration with Human Feedback0
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning0
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage0
Towards Generalizable Reinforcement Learning for Trade Execution0
Explaining RL Decisions with TrajectoriesCode0
What can online reinforcement learning with function approximation benefit from general coverage conditions?0
Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated EnvironmentsCode0
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning0
Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement LearningCode0
Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents0
Enabling A Network AI Gym for Autonomous Cyber Agents0
Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization0
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from ObservationsCode0
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions0
Deep RL with Hierarchical Action Exploration for Dialogue Generation0
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning0
Deploying Offline Reinforcement Learning with Human Feedback0
Graph Decision Transformer0
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning0
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples0
Learning to Influence Human Behavior with Offline Reinforcement Learning0
Decision Transformer under Random Frame DroppingCode0
Learning to Control Autonomous Fleets from Observation via Offline Reinforcement LearningCode0
The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning0
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation0
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function ApproximationCode0
Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications0
Language Decision Transformers with Exponential Tilt for Interactive Text Environments0
A Strong Baseline for Batch Imitation Learning0
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage0
Selective Uncertainty Propagation in Offline RL0
Revisiting Bellman Errors for Offline Model SelectionCode0
Show:102550
← PrevPage 11 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified