SOTAVerified

SportSett:Basketball - A robust and maintainable dataset for Natural Language Generation

2020-09-07IntelLang 2020Unverified0· sign in to hype

Craig Thomson, Ehud Reiter, Somayajulu Sripada

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Data2Text Natural Language Generation is a complex and varied task. We investigate the data requirements for the difficult real-world problem of generating statistic-focused summaries of basketball games. This has recently been tackled using the Rotowire and Rotowire-FG datasets of paired data and text. It can, however, be difficult to filter, query, and maintain such large volumes of data. In this resource paper, we introduce the SportSett:Basketball database. This easy-to-use resource allows for simple scripts to be written which generate data in suitable formats for a variety of systems. Building upon the existing data, we provide more attributes, across multiple dimensions, increasing the overlap of content between data and text. We also highlight and resolve issues of training, validation and test partition contamination in these previous datasets.

Tasks

Reproductions