The Power of Character N-grams in Native Language Identification

2017-09-01WS 2017Unverified0· sign in to hype

Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan Van Noord, Barbara Plank, Martijn Wieling

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017. Our basic system (GRONINGEN) achieves the best performance (87.56 F1-score) on the evaluation set using only 1-9 character n-grams as features. We compare this against several ensemble and meta-classifiers in order to examine how the linear system fares when combined with other, especially non-linear classifiers. Special emphasis is placed on the topic bias that exists by virtue of the assessment essay prompt distribution.

Tasks

Language Identification Native Language Identification Text Classification

The Power of Character N-grams in Native Language Identification

Abstract

Tasks

Reproductions