White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

2022-02-11Code Available1· sign in to hype

Shahrukh Khan, Mahnoor Shahid, Navdeeppal Singh

Code Available — Be the first to reproduce this paper.

Code

github.com/shahrukhx01/adversarial-bert-german-attacks-defense
OfficialIn paperpytorch★ 18

Abstract

In this work, we evaluate the adversarial robustness of BERT models trained on German Hate Speech datasets. We also complement our evaluation with two novel white-box character and word level attacks thereby contributing to the range of attacks available. Furthermore, we also perform a comparison of two novel character-level defense strategies and evaluate their robustness with one another.

Tasks

Adversarial Robustness

White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

Code

Abstract

Tasks

Reproductions