SOTAVerified

A Relation Extraction Dataset for Knowledge Extraction from Web Tables

2022-10-01COLING 2022Code Available0· sign in to hype

Siffi Singh, Alham Fikri Aji, Gaurav Singh, Christos Christodoulopoulos

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Relational web-tables are significant sources of structural information that are widely used for relation extraction and population of facts into knowledge graphs. To transform the web-table data into knowledge, we need to identify the relations that exist between column pairs. Currently, there are only a handful of publicly available datasets with relations annotated against natural web-tables. Most datasets are constructed using synthetic tables that lack valuable metadata information, or are limited in size to be considered as a challenging evaluation set. In this paper, we present REDTab, the largest natural-table relation extraction dataset. We have annotated ~9K tables and ~22K column pairs using crowd sourced annotators from MTurk, which has 50x larger number of column pairs than the existing human-annotated benchmark. Our test set is specially designed to be challenging as observed in our experiment results using TaBERT. We publicly release REDTab as a benchmark for the evaluation process in relation extraction.

Tasks

Reproductions