A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

2021-10-10Code Available1· sign in to hype

Jake Grigsby, Yanjun Qi

Code Available — Be the first to reproduce this paper.

Code

github.com/jakegrigsby/deep_control
OfficialIn paperpytorch★ 106
github.com/jakegrigsby/super_sac
OfficialIn paperpytorch★ 41
github.com/jakegrigsby/cc-afbc
OfficialIn paperpytorch★ 4

Abstract

Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.

Tasks

Decision Making reinforcement-learning Reinforcement Learning (RL)

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Code

Abstract

Tasks

Reproductions