SOTAVerified

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

2021-09-24EMNLP (MRQA) 2021Code Available0· sign in to hype

Fahim Faisal, Antonios Anastasopoulos

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all languages, they need to operate cross-lingually. In this work we investigate the capabilities of multilingually pre-trained language models on cross-lingual QA. We find that explicitly aligning the representations across languages with a post-hoc fine-tuning step generally leads to improved performance. We additionally investigate the effect of data size as well as the language choice in this fine-tuning step, also releasing a dataset for evaluating cross-lingual QA systems. Code and dataset are publicly available here: https://github.com/ffaisal93/aligned_qa

Tasks

Reproductions