CitRet: A Hybrid Model for Cited Text Span Retrieval

2022-10-01COLING 2022Code Available0· sign in to hype

Amit Pandey, Avani Gupta, Vikram Pudi

Code Available — Be the first to reproduce this paper.

Code

github.com/amitpandey-research/citret_public
OfficialIn paperpytorch★ 0

Abstract

The paper aims to identify cited text spans in the reference paper related to the given citance in the citing paper. We refer to it as cited text span retrieval (CTSR). Most current methods attempt this task by relying on pre-trained off-the-shelf deep learning models like SciBERT. Though these models are pre-trained on large datasets, they under-perform in out-of-domain settings. We introduce CitRet, a novel hybrid model for CTSR that leverages unique semantic and syntactic structural characteristics of scientific documents. This enables us to use significantly less data for finetuning. We use only 1040 documents for finetuning. Our model augments mildly-trained SBERT-based contextual embeddings with pre-trained non-contextual Word2Vec embeddings to calculate semantic textual similarity. We demonstrate the performance of our model on the CLSciSumm shared tasks. It improves the state-of-the-art results by over 15% on the F1 score evaluation.

Tasks

Retrieval Semantic Textual Similarity

CitRet: A Hybrid Model for Cited Text Span Retrieval

Code

Abstract

Tasks

Reproductions