Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

2020-11-01International Semantic Web Conference (ISWC) 2020Code Available0· sign in to hype

Vincenzo Cutrona, Federico Bianchi, Ernesto Jimenez-Ruiz, Matteo Palmonari

Code Available — Be the first to reproduce this paper.

Code

github.com/vcutrona/tough-tables
In papernone★ 4

Abstract

Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify diﬀerent eﬀorts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artiﬁcial noise to the data to make the annotation trickier. However, it is hard to analyze speciﬁc aspects in an automatic way. For example, the ambiguity of names at the entity-level can largely aﬀect the quality of the annotation. In this paper, we propose a novel dataset to complement the datasets proposed by SemTab. The dataset consists of a set of highquality manually-curated tables with non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled entity names not appearing in the current version of the SemTab dataset. These challenges are particularly relevant for the ingestion of structured legacy sources into existing knowledge graphs. Evaluations run on this dataset show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the ﬁeld.

Tasks

Cell Entity Annotation Column Type Annotation Entity Linking Knowledge Graphs Table annotation

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

Code

Abstract

Tasks

Reproductions