Two-Step Classification using Recasted Data for Low Resource Settings

2020-12-01Asian Chapter of the Association for Computational LinguisticsCode Available1· sign in to hype

Shagun Uppal, Vivek Gupta, Avinash Swaminathan, Haimin Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah, Amanda Stent

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/midas-research/hindi-nli-code
OfficialIn paperpytorch★ 15
github.com/midas-research/hindi-nli-data
OfficialIn papernone★ 14

Abstract

An NLP model's ability to reason should be independent of language. Previous works utilize Natural Language Inference (NLI) to understand the reasoning ability of models, mostly focusing on high resource languages like English. To address scarcity of data in low-resource languages such as Hindi, we use data recasting to create NLI datasets for four existing text classification datasets. Through experiments, we show that our recasted dataset is devoid of statistical irregularities and spurious patterns. We further study the consistency in predictions of the textual entailment models and propose a consistency regulariser to remove pairwise-inconsistencies in predictions. We propose a novel two-step classification method which uses textual-entailment predictions for classification task. We further improve the performance by using a joint-objective for classification and textual entailment. We therefore highlight the benefits of data recasting and improvements on classification performance using our approach with supporting experimental results.

Tasks

Classification Natural Language Inference text-classification Text Classification Vocal Bursts Valence Prediction

Two-Step Classification using Recasted Data for Low Resource Settings

Code

Abstract

Tasks

Reproductions