BiodivTab: Semantic Table Annotation Benchmark Construction, Analysis, and New Additions

2023-01-10Ontology Matching@ISWC 2022 2023Code Available0· sign in to hype

Nora Abdelmageed, Sirko Schindler, Birgitta König-Ries

Code Available — Be the first to reproduce this paper.

Code

github.com/fusion-jena/BiodivTab
In papernone★ 1

Abstract

Systems that annotate tabular data semantically have witnessed increasing attention from the community in recent years; this process is commonly known as Semantic Table Annotation (STA). Its objective is to map individual table elements to their counterparts from a Knowledge Graph (KG). Individual cells and columns are assigned to KG entities and classes to disambiguate their meaning. STA-systems achieve high scores on the existing, synthetic benchmarks but often struggle on real-world datasets. Thus, realistic evaluation benchmarks are needed to enable the advancement of the field. In this paper, we detail the construction pipeline of BiodivTab, the first benchmark based on real-world data from the biodiversity domain. In addition, we compare it with the existing benchmarks. Moreover, we highlight common data characteristics and challenges in the field. BiodivTab is publicly available and has 50 tables as a mixture of real and augmented samples from biodiversity datasets. It has been applied during the SemTab 2021 challenge, and participants achieved F1-scores of at most ∼ 60% across individual annotation tasks. Such results show that domain-specific benchmarks are more challenging for state-of-the-art systems than synthetic datasets.

Tasks

Table annotation

BiodivTab: Semantic Table Annotation Benchmark Construction, Analysis, and New Additions

Code

Abstract

Tasks

Reproductions