SOTAVerified

Practical Transformer-based Multilingual Text Classification

2021-06-01NAACL 2021Code Available0· sign in to hype

Cindy Wang, Michele Banko

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications. We present an empirical comparison of transformer-based text classification models in a variety of practical monolingual and multilingual pretraining and fine-tuning settings. We evaluate these methods on two distinct tasks in five different languages. Departing from prior work, our results show that multilingual language models can outperform monolingual ones in some downstream tasks and target languages. We additionally show that practical modifications such as task- and domain-adaptive pretraining and data augmentation can improve classification performance without the need for additional labeled data.

Tasks

Reproductions