Second-Order Adversarial Attack and Certifiable Robustness

2019-05-01ICLR 2019Unverified0· sign in to hype

Bai Li, Changyou Chen, Wenlin Wang, Lawrence Carin

Unverified — Be the first to reproduce this paper.

Abstract

Adversarial training has been recognized as a strong defense against adversarial attacks. In this paper, we propose a powerful second-order attack method that reduces the accuracy of the defense model by Madry et al. (2017). We demonstrate that adversarial training overfits to the choice of the norm in the sense that it is only robust to the attack used for adversarial training, thus suggesting it has not achieved universal robustness. The effectiveness of our attack method motivates an investigation of provable robustness of a defense model. To this end, we introduce a framework that allows one to obtain a certifiable lower bound on the prediction accuracy against adversarial examples. We conduct experiments to show the effectiveness of our attack method. At the same time, our defense model achieves significant improvements compared to previous works under our proposed attack.

Tasks

Adversarial Attack

Second-Order Adversarial Attack and Certifiable Robustness

Abstract

Tasks

Reproductions