SOTAVerified

Cross-Lingual UMLS Named Entity Linking using UMLS Dictionary Fine-Tuning

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study cross-lingual UMLS named entity linking, where mentions in a given source language are mapped to UMLS concepts, most of which are labeled in English. We propose a general solution that can be easily adapted to any source language and demonstrate the method on Hebrew documents. Our cross-lingual framework includes an offline unsupervised construction of a bilingual UMLS dictionary and a per-document pipeline which identifies UMLS candidate mentions and uses a fine-tuned pretrained transformer language model to filter candidates according to context. Our method exploits a small dataset of manually annotated UMLS mentions in the source language and uses this supervised data in two ways: to extend the unsupervised UMLS dictionary and to fine-tune the contextual filtering of candidate mentions in full documents. Our method addresses cross-lingual UMLS NEL in a low resource setting, where the ontology is large, there is a lack of descriptive text defining most entities, and labeled data can only cover a small portion of the ontology. We demonstrate results of our approach on both Hebrew and English. We achieve new state-of-the-art results on the Hebrew Camoni corpus, +8.9 F1 on average across three communities in the dataset. We also achieve new SOTA on the English dataset MedMentions with +7.3 F1.

Tasks

Reproductions