SOTAVerified

Neural Machine Translation Data Generation and Augmentation using ChatGPT

2023-07-11Unverified0· sign in to hype

Wayne Yang, Garrett Nicolai

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language models. Although these models are themselves trained on parallel data, they can leverage a multilingual vector space to create data, and may be able to supplement small manually-procured corpora. Our experiments highlight two key findings - despite a lack of diversity in their output, the hallucinated data improves the translation signal, even when the domain clashes with the original dataset.

Tasks

Reproductions