CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild

2021-11-01EMNLP 2021Code Available1· sign in to hype

Yuan YAO, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie zhou, Maosong Sun

Code Available — Be the first to reproduce this paper.

Code

github.com/thunlp/codred
OfficialIn paperpytorch★ 32

Abstract

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, a large quantity of relational facts in knowledge bases can only be inferred across documents in practice. In this work, we present the problem of cross-document RE, making an initial step towards knowledge acquisition in the wild. To facilitate the research, we construct the first human-annotated cross-document RE dataset CodRED. Compared to existing RE datasets, CodRED presents two key challenges: Given two entities, (1) it requires finding the relevant documents that can provide clues for identifying their relations; (2) it requires reasoning over multiple documents to extract the relational facts. We conduct comprehensive experiments to show that CodRED is challenging to existing RE methods including strong BERT-based models.

Tasks

Relation Relation Extraction

CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild

Code

Abstract

Tasks

Reproductions