SOTAVerified

Samsung R&D Institute Philippines at WMT 2023

2023-10-25Unverified0· sign in to hype

Jan Christian Blaise Cruz

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we describe the constrained MT systems submitted by Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: enhe and heen. Our systems comprise of Transformer-based sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and the use of noisy channel reranking during online decoding. Our models perform comparably to, and sometimes outperform, strong baseline unconstrained systems such as mBART50 M2M and NLLB 200 MoE despite having significantly fewer parameters on two public benchmarks: FLORES-200 and NTREX-128.

Tasks

Reproductions