Cross-Domain Detection of Abusive Language Online
2018-10-01WS 2018Unverified0· sign in to hype
Mladen Karan, Jan {\v{S}}najder
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over in-domain training, especially when used to augment a smaller dataset with a larger one.