Language Identification and Analysis of Code-Switched Social Media Text
2018-07-01WS 2018Unverified0· sign in to hype
Deepthi Mave, Suraj Maharjan, Thamar Solorio
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we detail our work on comparing different word-level language identification systems for code-switched Hindi-English data and a standard Spanish-English dataset. In this regard, we build a new code-switched dataset for Hindi-English. To understand the code-switching patterns in these language pairs, we investigate different code-switching metrics. We find that the CRF model outperforms the neural network based models by a margin of 2-5 percentage points for Spanish-English and 3-5 percentage points for Hindi-English.