EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text

2017-07-01ACL 2017Unverified0· sign in to hype

Claudio Delli Bovi, Jose Camacho-Collados, Aless Raganato, ro, Roberto Navigli

Unverified — Be the first to reproduce this paper.

Abstract

Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EuroSense, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a language-independent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.

Tasks

Entity Linking Machine Translation Translation Word Sense Disambiguation

EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text

Abstract

Tasks

Reproductions