Data Augmentation for Transformer-based G2P

2020-07-01WS 2020Unverified0· sign in to hype

Zach Ryan, Mans Hulden

Unverified — Be the first to reproduce this paper.

Abstract

The Transformer model has been shown to outperform other neural seq2seq models in several character-level tasks. It is unclear, however, if the Transformer would benefit as much as other seq2seq models from data augmentation strategies in the low-resource setting. In this paper we explore strategies for data augmentation in the g2p task together with the Transformer model. Our results show that a relatively simple alignment-based strategy of identifying consistent input-output subsequences in grapheme-phoneme data coupled together with a subsequent splicing together of such pieces to generate hallucinated data works well in the low-resource setting, often delivering substantial performance improvement over a standard Transformer model.

Tasks

Data Augmentation

Data Augmentation for Transformer-based G2P

Abstract

Tasks

Reproductions