SOTAVerified

SumTitles: a Summarization Dataset with Low Extractiveness

2020-12-01COLING 2020Unverified0· sign in to hype

Valentin Malykh, Konstantin Chernis, Ekaterina Artemova, Irina Piontkovskaya

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. We also present an alignment algorithm which we use to construct the corpus.

Tasks

Reproductions