Studying Generalisability across Abusive Language Detection Datasets
2019-11-01CONLL 2019Unverified0· sign in to hype
Steve Durairaj Swamy, Anupam Jamatia, Bj{\"o}rn Gamb{\"a}ck
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.