Improving Native Language Identification by Using Spelling Errors

2017-07-01ACL 2017Unverified0· sign in to hype

Lingzhen Chen, Carlo Strapparava, Vivi Nastase

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2\% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author's native language, compared to systems participating in the NLI shared task.

Tasks

Language Identification Native Language Identification

Improving Native Language Identification by Using Spelling Errors

Abstract

Tasks

Reproductions