SOTAVerified

Offline RL

Papers

Showing 51100 of 755 papers

TitleStatusHype
NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic ScenariosCode1
Behaviour Discovery and Attribution for Explainable Reinforcement Learning0
Evaluation-Time Policy Switching for Offline Reinforcement Learning0
The Pitfalls of Imitation Learning when Actions are Continuous0
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning0
Policy Constraint by Only Support Constraint for Offline Reinforcement LearningCode0
Energy-Weighted Flow Matching for Offline Reinforcement Learning0
What Makes a Good Diffusion Planner for Decision Making?Code2
Scalable Decision-Making in Stochastic Environments through Learned Temporal AbstractionCode0
Yes, Q-learning Helps Offline In-Context RL0
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective0
Which Features are Best for Successor Features?0
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches0
Active Advantage-Aligned Online Reinforcement Learning with Offline DataCode0
Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits0
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning0
OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds0
Flow Q-LearningCode3
Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation0
Resilient UAV Trajectory Planning via Few-Shot Meta-Offline Reinforcement Learning0
GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic EnvironmentsCode1
Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback0
Data Center Cooling System Optimization Using Offline Reinforcement Learning0
Fat-to-Thin Policy Optimization: Offline RL with Sparse PoliciesCode0
Large Language Model driven Policy Exploration for Recommender Systems0
DRDT3: Diffusion-Refined Decision Test-Time Training Model0
SR-Reward: Taking The Path More Traveled0
On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures0
Goal-Conditioned Data Augmentation for Offline Reinforcement Learning0
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement LearningCode1
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RLCode0
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization0
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting0
Are Expressive Models Truly Necessary for Offline RL?Code1
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement LearningCode1
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement LearningCode0
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline DataCode2
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone0
Reinforcement Learning: An OverviewCode0
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting0
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback0
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic PerspectiveCode2
Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization0
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement0
Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative TradingCode2
LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble0
Preserving Expert-Level Privacy in Offline Reinforcement Learning0
Continual Task Learning through Adaptive Policy Self-CompositionCode0
Doubly Mild Generalization for Offline Reinforcement LearningCode1
Show:102550
← PrevPage 2 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified