SOTAVerified

Training data reduction for multilingual Spoken Language Understanding systems

2021-12-01ICON 2021Unverified0· sign in to hype

Anmol Bansal, Anjali Shenoy, Krishna Chaitanya Pappu, Kay Rottmann, Anurag Dwarakanath

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Fine-tuning self-supervised pre-trained language models such as BERT has significantly improved state-of-the-art performance on natural language processing tasks. Similar finetuning setups can also be used in commercial large scale Spoken Language Understanding (SLU) systems to perform intent classification and slot tagging on user queries. Finetuning such powerful models for use in commercial systems requires large amounts of training data and compute resources to achieve high performance. This paper is a study on the different empirical methods of identifying training data redundancies for the fine tuning paradigm. Particularly, we explore rule based and semantic techniques to reduce data in a multilingual fine tuning setting and report our results on key SLU metrics. Through our experiments, we show that we can achieve on par/better performance on fine-tuning using a reduced data set as compared to a model finetuned on the entire data set.

Tasks

Reproductions