SOTAVerified

Learning Data Augmentation Schedules for Natural Language Processing

2021-11-01EMNLP (insights) 2021Code Available0· sign in to hype

Daphné Chopard, Matthias S. Treder, Irena Spasić

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Despite its proven efficiency in other fields, data augmentation is less popular in the context of natural language processing (NLP) due to its complexity and limited results. A recent study (Longpre et al., 2020) showed for example that task-agnostic data augmentations fail to consistently boost the performance of pretrained transformers even in low data regimes. In this paper, we investigate whether data-driven augmentation scheduling and the integration of a wider set of transformations can lead to improved performance where fixed and limited policies were unsuccessful. Our results suggest that, while this approach can help the training process in some settings, the improvements are unsubstantial. This negative result is meant to help researchers better understand the limitations of data augmentation for NLP.

Tasks

Reproductions