SOTAVerified

Rakuten’s Participation in WAT 2022: Parallel Dataset Filtering by Leveraging Vocabulary Heterogeneity

2022-10-01WAT 2022Unverified0· sign in to hype

Alberto Poncelas, Johanes Effendi, Ohnmar Htun, Sunil Yadav, Dongzhe Wang, Saurabh Jain

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper introduces our neural machine translation system’s participation in the WAT 2022 shared translation task (team ID: sakura). We participated in the Parallel Data Filtering Task. Our approach based on Feature Decay Algorithms achieved +1.4 and +2.4 BLEU points for English to Japanese and Japanese to English respectively compared to the model trained on the full dataset, showing the effectiveness of FDA on in-domain data selection.

Tasks

Reproductions