SOTAVerified

Creating and Curating a Cross-Language Person-Entity Linking Collection

2012-05-01LREC 2012Unverified0· sign in to hype

Dawn Lawrie, James Mayfield, Paul McNamee, Douglas Oard

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. Name projections are then curated, again through crowdsourcing. This technique resulted in the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.

Tasks

Reproductions