Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Despite their recent popularity and well known advantages, dense retrievers still lag behind sparse methods such as BM25 in their ability to reliably match salient phrases and rare entities in the query. It has been argued that this is an inherent limitation of dense models. We disprove this claim by introducing the Salient Phrase Aware Retriever (SPAR), a dense retriever with the lexical matching capacity of a sparse model. In particular, we show that a dense retriever Λ can be trained to imitate a sparse one, and SPAR is built by augmenting a standard dense retriever with Λ. When evaluated on five open-domain question answering datasets and the MS MARCO passage retrieval task, SPAR sets a new state of the art for dense and sparse retrievers and can match or exceed the performance of more complicated dense-sparse hybrid systems.

Tasks

Open-Domain Question Answering Passage Retrieval Question Answering Retrieval

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Abstract

Tasks

Reproductions