Rethinking Adversarial Transferability from a Data Distribution Perspective

2021-09-29ICLR 2022Unverified0· sign in to hype

Yao Zhu, Jiacheng Sun, Zhenguo Li

Unverified — Be the first to reproduce this paper.

Abstract

Adversarial transferability enables attackers to generate adversarial examples from the source model to attack the target model, which has raised security concerns about the deployment of DNNs in practice. In this paper, we rethink adversarial transferability from a data distribution perspective and further enhance transferability by score matching based optimization. We identify that some samples with injecting small Gaussian noise can fool different target models, and their adversarial examples under different source models have much stronger transferability. We hypothesize that these samples are in the low-density region of the ground truth distribution where models are not well trained. To improve the attack success rate of adversarial examples, we match the adversarial attacks with the directions which effectively decrease the ground truth density. We propose Intrinsic Adversarial Attack (IAA), which smooths the activation function and decreases the impact of the later layers of a given normal model, to increase the alignment of adversarial attack and the gradient of joint data distribution. We conduct comprehensive transferable attacks against multiple DNNs and show that our IAA can boost the transferability of the crafted attacks in all cases and go beyond state-of-the-art methods.

Tasks

Adversarial Attack

Rethinking Adversarial Transferability from a Data Distribution Perspective

Abstract

Tasks

Reproductions