SOTAVerified

NewsEdits: A Dataset of News Article Revision Histories and a Novel Approach to Document-Level Edit Reasoning

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

News article revision histories have the potential to give us novel insights across varied fields of linguistics and social sciences. In this work, we present the first publicly available dataset of news revision histories, NewsEdits. Our dataset is massive and multilingual; it contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources based in three countries. We develop a highly-scalable sentence-matching algorithm which we use to reliably extract document and sentence-level edit actions: Add, Delete, Edit, Move. We conduct analyses characterizing the nature of edits. We show that sentences that are added and deleted between article versions are more likely to contain updating events, main content and quotes compared with unchanged sentences. Finally, we introduce three novel tasks aimed at predicting the edit actions given an old version. We show that they are learnable, but challenging for current large NLP models compared with expert human judgement. By offering insights into how news articles grow and update, we hope this can spur research in narrative framing and development and the informational needs of updating news stories.

Tasks

Reproductions