Roles of Words: What Should (n’t) Be Augmented in Text Augmentation on Text Classification Tasks?

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Previous text-editing-based methods augment the text in a non-selective manner: the words in the text are treated without difference during augmentation, which may result in unsatisfactory augmented samples. In this work, we present four kinds of roles of words (ROWs) which have different functions in text classification tasks, and design effective methods to automatically extract these ROWs based on statistical and semantic perspectives. Systematic experiments are conducted on what ROWs should (n't) be augmented during augmentation for classification tasks. Based on these experiments, we discover some interesting and instructive potential patterns that certain ROWs are especially suitable or unsuitable for certain augmentation operations. Guided by these patterns, we propose a set of Selective Text Augmentation (STA) operations, which significantly outperform traditional methods and show outstanding generalization performance.

Tasks

Classification Text Augmentation text-classification Text Classification

Roles of Words: What Should (n’t) Be Augmented in Text Augmentation on Text Classification Tasks?

Abstract

Tasks

Reproductions