SOTAVerified

Studying Generalisability across Abusive Language Detection Datasets

2019-11-01CONLL 2019Unverified0· sign in to hype

Steve Durairaj Swamy, Anupam Jamatia, Bj{\"o}rn Gamb{\"a}ck

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

Tasks

Reproductions