A Deep Generative Approach to Native Language Identification

2020-12-01COLING 2020Unverified0· sign in to hype

Ehsan Lotfi, Ilia Markov, Walter Daelemans

Unverified — Be the first to reproduce this paper.

Abstract

Native language identification (NLI) -- identifying the native language (L1) of a person based on his/her writing in the second language (L2) -- is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets.

Tasks

BIG-bench Machine Learning Language Identification Language Modelling Marketing Multi-class Classification Native Language Identification

A Deep Generative Approach to Native Language Identification

Abstract

Tasks

Reproductions