SOTAVerified

Offline RL

Papers

Showing 101150 of 755 papers

TitleStatusHype
Decoupled Prioritized Resampling for Offline RLCode1
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory WeightingCode1
CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender SystemCode1
Offline Reinforcement Learning via High-Fidelity Generative Behavior ModelingCode1
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement LearningCode1
Offline Reinforcement Learning with In-sample Q-LearningCode1
COMBO: Conservative Offline Model-Based Policy OptimizationCode1
Agent-Controller Representations: Principled Offline RL with Rich Exogenous InformationCode1
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration BiasCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Federated Ensemble-Directed Offline Reinforcement LearningCode1
Conservative Offline Distributional Reinforcement LearningCode1
Conservative Q-Learning for Offline Reinforcement LearningCode1
Zero-Shot Reinforcement Learning from Low Quality DataCode1
Reliable Conditioning of Behavioral Cloning for Offline Reinforcement LearningCode1
Consistency Models as a Rich and Efficient Policy Class for Reinforcement LearningCode1
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement LearningCode1
Online reinforcement learning with sparse rewards through an active inference capsuleCode1
Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive RecommendationCode1
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree SearchCode1
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement LearningCode1
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement LearningCode1
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RLCode1
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-PerformerCode1
Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward ShapingCode1
Acme: A Research Framework for Distributed Reinforcement LearningCode1
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement LearningCode1
Extreme Q-Learning: MaxEnt RL without EntropyCode1
GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic EnvironmentsCode1
Efficient Offline Policy Optimization with a Learned ModelCode1
A Minimalist Approach to Offline Reinforcement LearningCode1
Curriculum Offline Imitation LearningCode1
Doubly Mild Generalization for Offline Reinforcement LearningCode1
Behavior Transformers: Cloning k modes with one stoneCode1
DMC-VB: A Benchmark for Representation Learning for Control with Visual DistractorsCode1
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement LearningCode1
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement LearningCode1
Efficient Diffusion Policies for Offline Reinforcement LearningCode1
Efficient Planning in a Compact Latent Action SpaceCode1
CROP: Conservative Reward for Model-based Offline Policy OptimizationCode1
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement LearningCode1
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior RegularizationCode1
Critic Regularized RegressionCode1
Behavior Proximal Policy OptimizationCode1
DataLight: Offline Data-Driven Traffic Signal ControlCode1
Critic-Guided Decision Transformer for Offline Reinforcement LearningCode1
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement LearningCode1
Discriminator-Weighted Offline Imitation Learning from Suboptimal DemonstrationsCode1
cosFormer: Rethinking Softmax in AttentionCode1
When should we prefer Decision Transformers for Offline Reinforcement Learning?Code1
Show:102550
← PrevPage 3 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1KFCAverage Reward81.8Unverified
2ADMPOAverage Reward81Unverified
3Decision Transformer (DT)Average Reward73.5Unverified
#ModelMetricClaimedVerifiedStatus
1ParPID4RL Normalized Score151.4Unverified