BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

In this work, we investigate the robustness of BERT using four word substitution-based attacks. We combine a human evaluation of individual word substitutions and a probabilistic analysis to show that between 96% and 99% of the analyzed attacks do not preserve semantics, indicating that their success is mainly based on feeding poor data to the model. To further confirm that, we introduce an efficient data augmentation procedure and show that many successful attacks can be prevented by including data similar to adversarial examples during training. Compared to traditional adversarial training, our data augmentation procedure requires 30x less computation time per epoch, while achieving better performance on two out of three datasets. We introduce an additional post-processing step that reduces the success rates of state-of-the-art attacks below 4%, 5%, and 8% on the three considered datasets. Finally, by looking at constraints for word substitutions that better preserve the semantics, we conclude that BERT is considerably more robust than previous research suggests.

Tasks

Data Augmentation text-classification Text Classification

BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification

Abstract

Tasks

Reproductions