Multi-attacks: Many images + the same adversarial attack many target labels
Stanislav Fort
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/stanislavfort/multi-attacksOfficialIn papernone★ 10
Abstract
We show that we can easily design a single adversarial perturbation P that changes the class of n images X_1,X_2,,X_n from their original, unperturbed classes c_1, c_2,,c_n to desired (not necessarily all the same) classes c^*_1,c^*_2,,c^*_n for up to hundreds of images and target classes at once. We call these multi-attacks. Characterizing the maximum n we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around 10^O(100), posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.