Vocabulary-informed Language Encoding

2022-10-01COLING 2022Unverified0· sign in to hype

Xi Ai, Bin Fang

Unverified — Be the first to reproduce this paper.

Abstract

A Multilingual model relies on language encodings to identify input languages because the multilingual model has to distinguish between the input and output languages or among all the languages for cross-lingual tasks. Furthermore, we find that language encodings potentially refine multiple morphologies of different languages to form a better isomorphic space for multilinguality. To leverage this observation, we present a method to compute a vocabulary-informed language encoding as the language representation, for a required language, considering a local vocabulary covering an acceptable amount of the most frequent word embeddings in this language. In our experiments, our method can consistently improve the performance of multilingual models on unsupervised neural machine translation and cross-lingual embedding.

Tasks

Machine Translation Translation Word Embeddings

Vocabulary-informed Language Encoding

Abstract

Tasks

Reproductions