Neural Machine Translation Data Generation and Augmentation using ChatGPT
2023-07-11Unverified0· sign in to hype
Wayne Yang, Garrett Nicolai
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language models. Although these models are themselves trained on parallel data, they can leverage a multilingual vector space to create data, and may be able to supplement small manually-procured corpora. Our experiments highlight two key findings - despite a lack of diversity in their output, the hallucinated data improves the translation signal, even when the domain clashes with the original dataset.