SOTAVerified

CombAlign: a Tool for Obtaining High-Quality Word Alignments

2021-05-01NoDaLiDa 2021Code Available0· sign in to hype

Steinþór Steingrímsson, Hrafn Loftsson, Andy Way

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Being able to generate accurate word alignments is useful for a variety of tasks. While statistical word aligners can work well, especially when parallel training data are plentiful, multilingual embedding models have recently been shown to give good results in unsupervised scenarios. We evaluate an ensemble method for word alignment on four language pairs and demonstrate that by combining multiple tools, taking advantage of their different approaches, substantial gains can be made. This holds for settings ranging from very low-resource to high-resource. Furthermore, we introduce a new gold alignment test set for Icelandic and a new easy-to-use tool for creating manual word alignments.

Tasks

Reproductions