Rare but Severe Errors Induced by Minimal Deletions in English-Chinese Neural Machine Translation
2021-10-16ACL ARR October 2021Unverified0· sign in to hype
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We examine the inducement of rare but severe errors in English-Chinese and Chinese-English Transformer-based neural machine translation by minimal deletion in the source text. We also examine the effect of training data size on the number and types of pathological cases induced by these perturbations, finding significant variation. We find that one type of hallucination can be remedied through data preprocessing and that deleting words hurts more than deleting characters in a character-based model, even though deleting characters introduces nonsense words.