SOTAVerified

The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity

2018-11-01WS 2018Code Available0· sign in to hype

Fatemeh Torabi Asr, Maite Taboada

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Misinformation detection at the level of full news articles is a text classification problem. Reliably labeled data in this domain is rare. Previous work relied on news articles collected from so-called ``reputable'' and ``suspicious'' websites and labeled accordingly. We leverage fact-checking websites to collect individually-labeled news articles with regard to the veracity of their content and use this data to test the cross-domain generalization of a classifier trained on bigger text collections but labeled according to source reputation. Our results suggest that reputation-based classification is not sufficient for predicting the veracity level of the majority of news articles, and that the system performance on different test datasets depends on topic distribution. Therefore collecting well-balanced and carefully-assessed training data is a priority for developing robust misinformation detection systems.

Tasks

Reproductions