SOTAVerified

Offline RL

Papers

Showing 76100 of 755 papers

TitleStatusHype
DRDT3: Diffusion-Refined Decision Test-Time Training Model0
SR-Reward: Taking The Path More Traveled0
On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures0
Goal-Conditioned Data Augmentation for Offline Reinforcement Learning0
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement LearningCode1
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RLCode0
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization0
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting0
Are Expressive Models Truly Necessary for Offline RL?Code1
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement LearningCode1
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement LearningCode0
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline DataCode2
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone0
Reinforcement Learning: An Overview0
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting0
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback0
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic PerspectiveCode2
Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization0
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement0
Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative TradingCode2
LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble0
Preserving Expert-Level Privacy in Offline Reinforcement Learning0
Continual Task Learning through Adaptive Policy Self-CompositionCode0
Doubly Mild Generalization for Offline Reinforcement LearningCode1
Show:102550
← PrevPage 4 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified