SOTAVerified

TorchicTab: Semantic Table Annotation with Wikidata and Language Models

2023-11-20SemTab@ISWC 2023 2023Unverified0· sign in to hype

Ioannis Dasoulas, Duo Yang, Xuemin Duan, Anastasia Dimou

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

An abundance of tabular data exists and is used by a wide range of applications. However, a big portion of these data lack the semantic information necessary for users and machines to properly understand them. This lack of table semantic understanding impedes their usage in data analytics pipelines. Solutions to semantically interpret tables exist but they are focused on specific annotation tasks and types of tables, and rely on large knowledge bases, making it difficult to re-use in real-world settings. Thus, more robust systems that produce more precise annotations and adapt to different table types are needed. The Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) was introduced in an effort to benchmark semantic table interpretation systems, by evaluating them over diverse datasets and tasks. In this paper, we introduce TorchicTab, a versatile semantic table interpretation system able to annotate tables with varied structures by using either an external knowledge graph, such as Wikidata, or annotated tables with pre-defined terms for training. We evaluate our proposed system according to the different annotation tasks of the SemTab challenge. The results show that our system can produce accurate annotations for different tasks across varied datasets.

Tasks

Reproductions