Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice

2021-05-30Code Available0· sign in to hype

Rongzhou Bao, Jiayi Wang, Hai Zhao

Code Available — Be the first to reproduce this paper.

Code

github.com/LilyNLP/ADFAR
OfficialIn paperpytorch★ 9

Abstract

Pre-trained contextualized language models (PrLMs) have led to strong performance gains in downstream natural language understanding tasks. However, PrLMs can still be easily fooled by adversarial word substitution, which is one of the most challenging textual adversarial attack methods. Existing defence approaches suffer from notable performance loss and complexities. Thus, this paper presents a compact and performance-preserved framework, Anomaly Detection with Frequency-Aware Randomization (ADFAR). In detail, we design an auxiliary anomaly detection classifier and adopt a multi-task learning procedure, by which PrLMs are able to distinguish adversarial input samples. Then, in order to defend adversarial word substitution, a frequency-aware randomization process is applied to those recognized adversarial input samples. Empirical results show that ADFAR significantly outperforms those newly proposed defense methods over various tasks with much higher inference speed. Remarkably, ADFAR does not impair the overall performance of PrLMs. The code is available at https://github.com/LilyNLP/ADFAR

Tasks

Adversarial Attack Anomaly Detection Multi-Task Learning Natural Language Understanding

Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice

Code

Abstract

Tasks

Reproductions