TL;DR: Mining Reddit to Learn Automatic Summarization
2017-09-01WS 2017Unverified0· sign in to hype
Michael V{\"o}lske, Martin Potthast, Shahbaz Syed, Benno Stein
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Recent advances in automatic text summarization have used deep neural networks to generate high-quality abstractive summaries, but the performance of these models strongly depends on large amounts of suitable training data. We propose a new method for mining social media for author-provided summaries, taking advantage of the common practice of appending a ``TL;DR'' to long posts. A case study using a large Reddit crawl yields the Webis-TLDR-17 dataset, complementing existing corpora primarily from the news genre. Our technique is likely applicable to other social media sites and general web crawls.