Detecting Entailment in Code-Mixed Hindi-English Conversations

2020-11-01EMNLP (WNUT) 2020Code Available0· sign in to hype

Sharanya Chakravarthy, Anjana Umapathy, Alan W Black

Code Available — Be the first to reproduce this paper.

Code

github.com/sharanyarc96/hinglishnli
OfficialIn papernone★ 0

Abstract

The presence of large-scale corpora for Natural Language Inference (NLI) has spurred deep learning research in this area, though much of this research has focused solely on monolingual data. Code-mixing is the intertwined usage of multiple languages, and is commonly seen in informal conversations among polyglots. Given the rising importance of dialogue agents, it is imperative that they understand code-mixing, but the scarcity of code-mixed Natural Language Understanding (NLU) datasets has precluded research in this area. The dataset by Khanuja et. al. for detecting conversational entailment in code-mixed Hindi-English text is the first of its kind. We investigate the effectiveness of language modeling, data augmentation, translation, and architectural approaches to address the code-mixed, conversational, and low-resource aspects of this dataset. We obtain an 8.09% increase in test set accuracy over the current state of the art.

Tasks

Data Augmentation Language Modeling Language Modelling Natural Language Inference Natural Language Understanding Translation

Detecting Entailment in Code-Mixed Hindi-English Conversations

Code

Abstract

Tasks

Reproductions