Defending against Whitebox Adversarial Attacks via Randomized Discretization

2019-03-25Code Available0· sign in to hype

Yuchen Zhang, Percy Liang

Code Available — Be the first to reproduce this paper.

Code

worksheets.codalab.org/worksheets/0x822ba2f9005f49f08755a84443c76456
Officialnone★ 0

Abstract

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) _-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.

Tasks

Adversarial Attack General Classification

Defending against Whitebox Adversarial Attacks via Randomized Discretization

Code

Abstract

Tasks

Reproductions