Correlating Twitter Language with Community-Level Health Outcomes
Arno Schneuwly, Ralf Grubenmann, Séverine Rion Logean, Mark Cieliebak, Martin Jaggi
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/epfml/correlating-tweetsOfficialIn papernone★ 0
Abstract
We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.