Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval

2019-11-01WS 2019Unverified0· sign in to hype

Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, Zhongqiang Huang

Unverified — Be the first to reproduce this paper.

Abstract

We propose a weakly supervised neural model for Ad-hoc Cross-lingual Information Retrieval (CLIR) from low-resource languages. Low resource languages often lack relevance annotations for CLIR, and when available the training data usually has limited coverage for possible queries. In this paper, we design a model which does not require relevance annotations, instead it is trained on samples extracted from translation corpora as weak supervision. This model relies on an attention mechanism to learn spans in the foreign sentence that are relevant to the query. We report experiments on two low resource languages: Swahili and Tagalog, trained on less that 100k parallel sentences each. The proposed model achieves 19 MAP points improvement compared to using CNNs for feature extraction, 12 points improvement from machine translation-based CLIR, and up to 6 points improvement compared to probabilistic CLIR models.

Tasks

Cross-Lingual Information Retrieval Information Retrieval Machine Translation Retrieval Sentence Translation

Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval

Abstract

Tasks

Reproductions