Label Noise in Context

2020-07-01ACL 2020Unverified0· sign in to hype

Michael Desmond, Catherine Finegan-Dollak, Jeff Boston, Matt Arnold

Unverified — Be the first to reproduce this paper.

Abstract

Label noise---incorrectly or ambiguously labeled training examples---can negatively impact model performance. Although noise detection techniques have been around for decades, practitioners rarely apply them, as manual noise remediation is a tedious process. Examples incorrectly flagged as noise waste reviewers' time, and correcting label noise without guidance can be difficult. We propose LNIC, a noise-detection method that uses an example's neighborhood within the training set to (a) reduce false positives and (b) provide an explanation as to why the ex- ample was flagged as noise. We demonstrate on several short-text classification datasets that LNIC outperforms the state of the art on measures of precision and F0.5-score. We also show how LNIC's training set context helps a reviewer to understand and correct label noise in a dataset. The LNIC tool lowers the barriers to label noise remediation, increasing its utility for NLP practitioners.

Tasks

text-classification Text Classification

Label Noise in Context

Abstract

Tasks

Reproductions