SOTAVerified

Enriching the E2E dataset

2021-08-01INLG (ACL) 2021Code Available0· sign in to hype

Thiago castro Ferreira, Helena Vaz, Brian Davis, Adriana Pagano

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This study introduces an enriched version of the E2E dataset, one of the most popular language resources for data-to-text NLG. We extract intermediate representations for popular pipeline tasks such as discourse ordering, text structuring, lexicalization and referring expression generation, enabling researchers to rapidly develop and evaluate their data-to-text pipeline systems. The intermediate representations are extracted by aligning non-linguistic and text representations through a process called delexicalization, which consists in replacing input referring expressions to entities/attributes with placeholders. The enriched dataset is publicly available.

Tasks

Reproductions