Investigating the Use of BERT Anchors for Bilingual Lexicon Induction with Minimal Supervision

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

This paper investigates the use of static anchors from transformer architectures for the task of Bilingual Lexicon Induction. We revisit an existing approach built around the ELMo architecture and explore the use of the methodology on the BERT family of language models. Experiments are performed and analysed for three language pairs, combining English with three target languages from very different language families, Hindi, Dutch, and Russian. Although the contextualised approach is not able to outperform the SOTA VecMap method, we find that it is easily adaptable to newer transformer models and can compete with the MUSE approach. An error analysis reveals interesting trends accross languages and shows how the method could be further improved by building on the basic hypothesis that transformer embeddings can indeed be decomposed into a static anchor and a dynamic context component. We make the code, the extracted anchors (before and after alignement) and the modified train and test sets available for use.

Tasks

Bilingual Lexicon Induction

Investigating the Use of BERT Anchors for Bilingual Lexicon Induction with Minimal Supervision

Abstract

Tasks

Reproductions