DALC: the Dutch Abusive Language Corpus

2021-08-01ACL (WOAH) 2021Code Available1· sign in to hype

Tommaso Caselli, Arjan Schelhaas, Marieke Weultjes, Folkert Leistra, Hylke van der Veen, Gerben Timmerman, Malvina Nissim

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/tommasoc80/dalc
OfficialIn papernone★ 11

Abstract

As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification.

Tasks

Abusive Language Binary Classification Classification

DALC: the Dutch Abusive Language Corpus

Code

Abstract

Tasks

Reproductions