SOTAVerified

Offline RL

Papers

Showing 351400 of 755 papers

TitleStatusHype
Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning0
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLCode0
Stabilizing Extreme Q-learning by Maclaurin ExpansionCode0
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models0
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning0
Causal prompting model-based offline reinforcement learning0
A Fast Convergence Theory for Offline Decision Making0
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning0
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory0
Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier0
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained OptimizationCode0
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q^π-Realizability and Concentrability0
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators0
Exclusively Penalized Q-learning for Offline Reinforcement Learning0
Offline Reinforcement Learning from Datasets with Structured Non-StationarityCode0
Offline RL via Feature-Occupancy Gradient Ascent0
Efficient Imitation Learning with Conservative World Models0
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?Code0
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses0
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning0
Improving Offline Reinforcement Learning with Inaccurate Simulators0
Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows0
Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning0
Generalize by Touching: Tactile Ensemble Skill Transfer for Robotic Furniture Assembly0
Offline Reinforcement Learning with Behavioral Supervisor Tuning0
An Offline Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems0
Data-Incremental Continual Offline Reinforcement Learning0
TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning AgentsCode0
Offline Trajectory Generalization for Offline Reinforcement Learning0
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL0
Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains0
Generative Probabilistic Planning for Optimizing Supply Chain Networks0
Compositional Conservatism: A Transductive Approach in Offline Reinforcement LearningCode0
CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning0
Scaling Vision-and-Language Navigation With Offline RL0
Uncertainty-aware Distributional Offline Reinforcement Learning0
Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling0
The Value of Reward Lookahead in Reinforcement Learning0
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning0
Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning0
Why Online Reinforcement Learning is Causal0
Offline Fictitious Self-Play for Competitive Games0
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding0
Align Your Intents: Offline Imitation Learning via Optimal Transport0
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic SpacesCode0
Offline Multi-task Transfer RL with Representational Penalization0
Learning Goal-Conditioned Policies from Sub-Optimal Offline Data via Metric Learning0
Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning0
Measurement Scheduling for ICU Patients with Offline Reinforcement Learning0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
Show:102550
← PrevPage 8 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified