The Needle in the haystack: Out-distribution aware Self-training in an Open-World Setting

2021-09-29Unverified0· sign in to hype

Maximilian Augustin, Matthias Hein

Unverified — Be the first to reproduce this paper.

Abstract

Traditional semi-supervised learning (SSL) has focused on the closed world assumption where all unlabeled samples are task-related. In practice, this assumption is often violated when leveraging data from very large image databases that contain mostly non-task-relevant samples. While standard self-training and other established methods fail in this open-world setting, we demonstrate that our out-distribution-aware self-learning (ODST) with a careful sample selection strategy can leverage unlabeled datasets with millions of samples, more than 1600 times larger than the labeled datasets, and which contain only about 2\% task-relevant inputs. Standard and open world SSL techniques degrade in performance when the ratio of task-relevant sample decreases and show a significant distribution shift which is problematic regarding AI safety while ODST outperforms them with respect to test performance, corruption robustness and out-of-distribution detection.

Tasks

Out-of-Distribution Detection Self-Learning

The Needle in the haystack: Out-distribution aware Self-training in an Open-World Setting

Abstract

Tasks

Reproductions