Challenges in Data-to-Document Generation

2017-07-25EMNLP 2017Code Available0· sign in to hype

Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

Code Available — Be the first to reproduce this paper.

Code

github.com/harvardnlp/boxscore-data
OfficialIn papernone★ 0
github.com/harvardnlp/data2text
OfficialIn papernone★ 0
github.com/KaijuML/rotowire-rg-metric
pytorch★ 7
github.com/ratishsp/data2text-1
none★ 0

Abstract

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

Tasks

Data-to-Text Generation Descriptive Text Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
RotoWire	Encoder-decoder + conditional copy	BLEU	14.19	—	Unverified
RotoWire (Content Ordering)	Encoder-decoder + conditional copy	DLD	8.68	—	Unverified
Rotowire (Content Selection)	Encoder-decoder + conditional copy	Precision	29.49	—	Unverified
RotoWire (Relation Generation)	Encoder-decoder + conditional copy	Precision	74.8	—	Unverified

Challenges in Data-to-Document Generation

Code

Abstract

Tasks

Benchmark Results

Reproductions