Disfluency Detection for Vietnamese
2022-10-01COLING (WNUT) 2022Code Available0· sign in to hype
Mai Dao, Thinh Hung Truong, Dat Quoc Nguyen
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/vinairesearch/phodisfluencyOfficialIn papernone★ 4
Abstract
In this paper, we present the first empirical study for Vietnamese disfluency detection. To conduct this study, we first create a disfluency detection dataset for Vietnamese, with manual annotations over two disfluency types. We then empirically perform experiments using strong baseline models, and find that: automatic Vietnamese word segmentation improves the disfluency detection performances of the baselines, and the highest performance results are obtained by fine-tuning pre-trained language models in which the monolingual model PhoBERT for Vietnamese does better than the multilingual model XLM-R.