Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples

2021-01-01Unverified0· sign in to hype

Dylan Z Slack, Nathalie Rauschmayr, Krishnaram Kenthapadi

Unverified — Be the first to reproduce this paper.

Abstract

With the greater proliferation of machine learning models, the imperative of diagnosing and correcting bugs in models has become increasingly clear. As a route to better discover and fix model bugs, we propose failure scenarios: regions on the data manifold that are incorrectly classified by a model. We propose an end-to-end debugging framework called Defuse to use these regions for fixing faulty classifier predictions. The Defuse framework works in three steps. First, Defuse identifies many unrestricted adversarial examples—naturally occurring instances that are misclassified—using a generative model. Next, the procedure distills the misclassified data using clustering. This step both reveals sources of model error and helps facilitate labeling. Last, the method corrects model behavior on the distilled scenarios through an optimization based approach. We illustrate the utility of our framework on a variety of image data sets. We find that Defuse identifies and resolves concerning predictions while maintaining model generalization.

Tasks

Clustering

Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples

Abstract

Tasks

Reproductions