SOTAVerified

Test accuracy is not all you need: Examining neural network behavior at the classification boundary

2021-09-22NeurIPS Workshop ICBINB 2021Unverified0· sign in to hype

Tiffany Joyce Vlaar

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We evaluate the classification of both human volunteers and various neural network models on a set of GAN-generated images that reflect the transition from one MNIST class to another. We find that models that obtain the same test accuracy on the standard MNIST test data set exhibit different behavior on these images. Further, we find that although the number of misclassified images decreases with test accuracy, the spread in predictions over multiple runs on images that are difficult to classify (for humans) also decreases with test accuracy. We conclude that test accuracy is an insufficient metric to capture the behavior of a network on images that lie along the boundary between classes.

Tasks

Reproductions