SOTAVerified

Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders

2016-08-09WS 2016Code Available0· sign in to hype

Antonio Valerio Miceli Barone

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Current approaches to learning vector representations of text that are compatible between different languages usually require some amount of parallel text, aligned at word, sentence or at least document level. We hypothesize however, that different natural languages share enough semantic structure that it should be possible, in principle, to learn compatible vector representations just by analyzing the monolingual distribution of words. In order to evaluate this hypothesis, we propose a scheme to map word vectors trained on a source language to vectors semantically compatible with word vectors trained on a target language using an adversarial autoencoder. We present preliminary qualitative results and discuss possible future developments of this technique, such as applications to cross-lingual sentence representations.

Tasks

Reproductions