Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

2020-10-07Code Available1· sign in to hype

Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, Pushmeet Kohli

Code Available — Be the first to reproduce this paper.

Code

github.com/imrahulr/adversarial_robustness_pytorch
pytorch★ 98
github.com/imrahulr/hat
pytorch★ 32

Abstract

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against _ and _2 norm-bounded perturbations of size 8/255 and 128/255, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against _ perturbations of size 8/255 on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against _2 perturbations of size 128/255 on CIFAR-10, and of 36.88% (+8.46%) against _ perturbations of size 8/255 on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

Tasks

Adversarial Robustness

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Code

Abstract

Tasks

Reproductions