SOTAVerified

Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization

2022-06-01LREC 2022Unverified0· sign in to hype

Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun, William Yang Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Neural text summarization has shown great potential in recent years. However, current state-of-the-art summarization models are limited by their maximum input length, posing a challenge to summarizing longer texts comprehensively. As part of a layered summarization architecture, we introduce PureText, a simple yet effective pre-processing layer that removes low- quality sentences in articles to improve existing summarization models. When evaluated on popular datasets like WikiHow and Reddit TIFU, we show up to 3.84 and 8.57 point ROUGE-1 absolute improvement on the full test set and the long article subset, respectively, for state-of-the-art summarization models such as BertSum and BART. Our approach provides downstream models with higher-quality sentences for summarization, improving overall model performance, especially on long text articles.

Tasks

Reproductions