SOTAVerified

Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique

2017-07-01ACL 2017Unverified0· sign in to hype

Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence. Existing models are restricted to code-switching between two specific languages and fail in real-world scenarios as text input rarely has a priori information on the languages used. We present a novel unsupervised word-level language detection technique for code-switched text for an arbitrarily large number of languages, which does not require any manually annotated training data. Our experiments with tweets in seven languages show a 74\% relative error reduction in word-level labeling with respect to competitive baselines. We then use this system to conduct a large-scale quantitative analysis of code-switching patterns on Twitter, both global as well as region-specific, with 58M tweets.

Tasks

Reproductions