SOTAVerified

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Before entering the neural network, a token needs to be converted to its one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from the pre-trained masked language model, which can be seen as a more informative augmented substitution to the one-hot representation. We propose an efficient data augmentation method, dub as text smoothing, by converting a sentence from its one-hot representation to controllable smoothed representation. We evaluate text smoothing on different datasets in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with these data augmentation methods to achieve better performance.

Tasks

Reproductions