Studying Generalisability across Abusive Language Detection Datasets

2019-11-01CONLL 2019Unverified0· sign in to hype

Steve Durairaj Swamy, Anupam Jamatia, Bj{\"o}rn Gamb{\"a}ck

Unverified — Be the first to reproduce this paper.

Abstract

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

Tasks

Abusive Language

Studying Generalisability across Abusive Language Detection Datasets

Abstract

Tasks

Reproductions