PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack

2022-10-01COLING 2022Unverified0· sign in to hype

Rui Zheng, Rong Bao, Qin Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Rui Xie, Wei Wu

Unverified — Be the first to reproduce this paper.

Abstract

Adversarial training, which minimizes the loss of adversarially perturbed examples, has received considerable attention. However, these methods require modifying all model parameters and optimizing the model from scratch, which is parameter inefficient and unfriendly to the already deployed models. As an alternative, we propose a pluggable defense module PlugAT, to provide robust predictions by adding a few trainable parameters to the model inputs while keeping the original model frozen. To reduce the potential side effects of using defense modules, we further propose a novel forgetting restricted adversarial training, which filters out bad adversarial examples that impair the performance of original ones. The PlugAT-equipped BERT model substantially improves robustness over several strong baselines on various text classification tasks, whilst training only 9.1% parameters. We observe that defense modules trained under the same model architecture have domain adaptation ability between similar text classification datasets.

Tasks

Adversarial Attack Domain Adaptation text-classification Text Classification

PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack

Abstract

Tasks

Reproductions