Cross-lingual NIL Entity Clustering for Low-resource Languages

2019-06-01WS 2019Unverified0· sign in to hype

Kevin Blissett, Heng Ji

Unverified — Be the first to reproduce this paper.

Abstract

Clustering unlinkable entity mentions across documents in multiple languages (cross-lingual NIL Clustering) is an important task as part of Entity Discovery and Linking (EDL). This task has been largely neglected by the EDL community because it is challenging to outperform simple edit distance or other heuristics based baselines. We propose a novel approach based on encoding the orthographic similarity of the mentions using a Recurrent Neural Network (RNN) architecture. Our model adapts a training procedure from the one-shot facial recognition literature in order to achieve this. We also perform several exploratory probing tasks on our name encodings in order to determine what specific types of information are likely to be encoded by our model. Experiments show our approach provides up to a 6.6\% absolute CEAFm F-Score improvement over state-of-the-art methods and successfully captures phonological relations across languages.

Tasks

Clustering

Cross-lingual NIL Entity Clustering for Low-resource Languages

Abstract

Tasks

Reproductions