SumTitles: a Summarization Dataset with Low Extractiveness
2020-12-01COLING 2020Unverified0· sign in to hype
Valentin Malykh, Konstantin Chernis, Ekaterina Artemova, Irina Piontkovskaya
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. We also present an alignment algorithm which we use to construct the corpus.