SOTAVerified

Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data

2018-07-01WS 2018Unverified0· sign in to hype

Devanshu Jain, Maria Kustikova, Mayank Darbari, Rishabh Gupta, Stephen Mayhew

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this work, we address the problem of Named Entity Recognition (NER) in code-switched tweets as a part of the Workshop on Computational Approaches to Linguistic Code-switching (CALCS) at ACL'18. Code-switching is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential code-switching, respectively. Processing such data is challenging using state of the art methods since such technology is generally geared towards processing monolingual text. In this paper we explored ways to use language identification and translation to recognize named entities in such data, however, utilizing simple features (sans multi-lingual features) with Conditional Random Field (CRF) classifier achieved the best results. Our experiments were mainly aimed at the (ENG-SPA) English-Spanish dataset but we submitted a language-independent version of our system to the (MSA-EGY) Arabic-Egyptian dataset as well and achieved good results.

Tasks

Reproductions