Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

2021-08-01ACL 2021Code Available1· sign in to hype

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, Xuanjing Huang

Code Available — Be the first to reproduce this paper.

Code

github.com/dugu9sword/dne
Officialpytorch★ 18

Abstract

Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

Tasks

Sentence

Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

Code

Abstract

Tasks

Reproductions