Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences

2016-12-01COLING 2016Unverified0· sign in to hype

Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, Min Zhang

Unverified — Be the first to reproduce this paper.

Abstract

Parallel sentence representations are important for bilingual and cross-lingual tasks in natural language processing. In this paper, we explore a bilingual autoencoder approach to model parallel sentences. We extract sentence-level global descriptors (e.g. min, max) from word embeddings, and construct two monolingual autoencoders over these descriptors on the source and target language. In order to tightly connect the two autoencoders with bilingual correspondences, we force them to share the same decoding parameters and minimize a corpus-level semantic distance between the two languages. Being optimized towards a joint objective function of reconstruction and semantic errors, our bilingual antoencoder is able to learn continuous-valued latent representations for parallel sentences. Experiments on both intrinsic and extrinsic evaluations on statistical machine translation tasks show that our autoencoder achieves substantial improvements over the baselines.

Tasks

Information Retrieval Machine Translation Sentence Translation Word Embeddings

Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences

Abstract

Tasks

Reproductions