SOTAVerified

Multi-objective Representation Learning for Scientific Document Retrieval

2022-10-01sdp (COLING) 2022Code Available0· sign in to hype

Mathias Parisot, Jakub Zavrel

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Existing dense retrieval models for scientific documents have been optimized for either retrieval by short queries, or for document similarity, but usually not for both. In this paper, we explore the space of combining multiple objectives to achieve a single representation model that presents a good balance between both modes of dense retrieval, combining the relevance judgements from MS MARCO with the citation similarity of SPECTER, and the self-supervised objective of independent cropping. We also consider the addition of training data from document co-citation in a sentence context and domain-specific synthetic data. We show that combining multiple objectives yields models that generalize well across different benchmark tasks, improving up to 73% over models trained on a single objective.

Tasks

Reproductions