Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we present an approach to improve the robustness of BERT language models against word substitution-based adversarial attacks by leveraging adversarial perturbations for self-supervised contrastive learning. We create an efficient word-level adversarial attack, and use it to finetune BERT on adversarial examples generated on the fly during training. In contrast with previous works, our method improves model robustness without using any labeled data. Experimental results show that our method improves robustness of BERT against four different word substitution-based adversarial attacks, and combining our method with adversarial training gives higher robustness than adversarial training alone. As our method improves the robustness of BERT purely with unlabeled data, it opens up the possibility of using large text datasets to train robust language models.

Tasks

Adversarial Attack Contrastive Learning

Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models

Abstract

Tasks

Reproductions