Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models

2022-05-01ACL 2022Unverified0· sign in to hype

Fatma Elsafoury

Unverified — Be the first to reproduce this paper.

Abstract

It has become crucial to develop tools for automated hate speech and abuse detection. These tools would help to stop the bullies and the haters and provide a safer environment for individuals especially from marginalized groups to freely express themselves. However, recent research shows that machine learning models are biased and they might make the right decisions for the wrong reasons. In this thesis, I set out to understand the performance of hate speech and abuse detection models and the different biases that could influence them. I show that hate speech and abuse detection models are not only subject to social bias but also to other types of bias that have not been explored before. Finally, I investigate the causal effect of the social and intersectional bias on the performance and unfairness of hate speech detection models.

Tasks

Abuse Detection Hate Speech Detection

Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models

Abstract

Tasks

Reproductions