SOTAVerified

Data Augmentation for Neural NLP

2023-02-22Unverified0· sign in to hype

Domagoj Pluščec, Jan Šnajder

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Data scarcity is a problem that occurs in languages and tasks where we do not have large amounts of labeled data but want to use state-of-the-art models. Such models are often deep learning models that require a significant amount of data to train. Acquiring data for various machine learning problems is accompanied by high labeling costs. Data augmentation is a low-cost approach for tackling data scarcity. This paper gives an overview of current state-of-the-art data augmentation methods used for natural language processing, with an emphasis on methods for neural and transformer-based models. Furthermore, it discusses the practical challenges of data augmentation, possible mitigations, and directions for future research.

Tasks

Reproductions