SOTAVerified

Hierarchical corpus encoder: Fusing generative retrieval and dense indices

2025-02-26Unverified0· sign in to hype

Tongfei Chen, Ankita Sharma, Adam Pauls, Benjamin Van Durme

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Generative retrieval employs sequence models for conditional generation of document IDs based on a query (DSI (Tay et al., 2022); NCI (Wang et al., 2022); inter alia). While this has led to improved performance in zero-shot retrieval, it is a challenge to support documents not seen during training. We identify the performance of generative retrieval lies in contrastive training between sibling nodes in a document hierarchy. This motivates our proposal, the hierarchical corpus encoder (HCE), which can be supported by traditional dense encoders. Our experiments show that HCE achieves superior results than generative retrieval models under both unsupervised zero-shot and supervised settings, while also allowing the easy addition and removal of documents to the index.

Tasks

Reproductions