SOTAVerified

Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering

2021-11-01Findings (EMNLP) 2021Code Available0· sign in to hype

Minghan Li, Ming Li, Kun Xiong, Jimmy Lin

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Multi-task dense retrieval models can be used to retrieve documents from a common corpus (e.g., Wikipedia) for different open-domain question-answering (QA) tasks. However, (CITATION) shows that jointly learning different QA tasks with one dense model is not always beneficial due to corpus inconsistency. For example, SQuAD only focuses on a small set of Wikipedia articles while datasets like NQ and Trivia cover more entries, and joint training on their union can cause performance degradation. To solve this problem, we propose to train individual dense passage retrievers (DPR) for different tasks and aggregate their predictions during test time, where we use uncertainty estimation as weights to indicate how probable a specific query belongs to each expert’s expertise. Our method reaches state-of-the-art performance on 5 benchmark QA datasets, with up to 10% improvement in top-100 accuracy compared to a joint-training multi-task DPR on SQuAD. We also show that our method handles corpus inconsistency better than the joint-training DPR on a mixed subset of different QA datasets. Code and data are available at https://github.com/alexlimh/DPR_MUF.

Tasks

Reproductions