The Power of Character N-grams in Native Language Identification
2017-09-01WS 2017Unverified0· sign in to hype
Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan Van Noord, Barbara Plank, Martijn Wieling
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017. Our basic system (GRONINGEN) achieves the best performance (87.56 F1-score) on the evaluation set using only 1-9 character n-grams as features. We compare this against several ensemble and meta-classifiers in order to examine how the linear system fares when combined with other, especially non-linear classifiers. Special emphasis is placed on the topic bias that exists by virtue of the assessment essay prompt distribution.