SOTAVerified

Translation of Biomedical Documents with Focus on Spanish-English

2018-10-01WS 2018Unverified0· sign in to hype

Mirela-Stefania Duma, Wolfgang Menzel

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10\% of the general domain data.

Tasks

Reproductions