SOTAVerified

Offline RL

Papers

Showing 401450 of 755 papers

TitleStatusHype
Offline Reinforcement Learning with Additional Covering Distributions0
Offline Primal-Dual Reinforcement Learning for Linear MDPs0
FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex ManipulationCode2
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models0
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning0
SLiC-HF: Sequence Likelihood Calibration with Human Feedback0
Revisiting the Minimalist Approach to Offline Reinforcement LearningCode1
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage0
Towards Generalizable Reinforcement Learning for Trade Execution0
Explaining RL Decisions with TrajectoriesCode0
Masked Trajectory Models for Prediction, Representation, and ControlCode1
Federated Ensemble-Directed Offline Reinforcement LearningCode1
Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in HealthcareCode1
What can online reinforcement learning with function approximation benefit from general coverage conditions?0
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion PoliciesCode1
Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated EnvironmentsCode0
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning0
Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement LearningCode0
Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents0
Enabling A Network AI Gym for Autonomous Cyber Agents0
Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization0
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from ObservationsCode0
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions0
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value RegularizationCode1
Optimal Transport for Offline Imitation LearningCode1
Deep RL with Hierarchical Action Exploration for Dialogue Generation0
DataLight: Offline Data-Driven Traffic Signal ControlCode1
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning0
Deploying Offline Reinforcement Learning with Human Feedback0
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-TuningCode1
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning0
Graph Decision Transformer0
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples0
Decision Transformer under Random Frame DroppingCode0
Learning to Influence Human Behavior with Offline Reinforcement Learning0
Learning to Control Autonomous Fleets from Observation via Offline Reinforcement LearningCode0
The In-Sample Softmax for Offline Reinforcement LearningCode1
The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning0
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation0
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function ApproximationCode0
Neural Laplace Control for Continuous-time Delayed SystemsCode1
Behavior Proximal Policy OptimizationCode1
Swapped goal-conditioned offline reinforcement learningCode1
Dual RL: Unification and New Methods for Reinforcement and Imitation LearningCode1
Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications0
Language Decision Transformers with Exponential Tilt for Interactive Text Environments0
A Strong Baseline for Batch Imitation Learning0
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage0
Selective Uncertainty Propagation in Offline RL0
Revisiting Bellman Errors for Offline Model SelectionCode0
Show:102550
← PrevPage 9 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified