Few-Shot Multilingual Open-Domain QA from 5 Examples

2025-02-27Code Available0· sign in to hype

Fan Jiang, Tom Drummond, Trevor Cohn

Code Available — Be the first to reproduce this paper.

Code

github.com/Fantabulous-J/FSMODQA
Officialpytorch★ 0

Abstract

Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a few-shot learning approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, FsModQA, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a cross-lingual prompting strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

Tasks

Few-Shot Learning Open-Domain Question Answering Question Answering

Few-Shot Multilingual Open-Domain QA from 5 Examples

Code

Abstract

Tasks

Reproductions