SOTAVerified

On Gender Biases in Offensive Language Classification Models

2022-07-01NAACL (GeBNLP) 2022Unverified0· sign in to hype

Sanjana Marcé, Adam Poliak

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We explore whether neural Natural Language Processing models trained to identify offensive language in tweets contain gender biases. We add historically gendered and gender ambiguous American names to an existing offensive language evaluation set to determine whether models? predictions are sensitive or robust to gendered names. While we see some evidence that these models might be prone to biased stereotypes that men use more offensive language than women, our results indicate that these models? binary predictions might not greatly change based upon gendered names.

Tasks

Reproductions