Tracing State-Level Obesity Prevalence from Sentence Embeddings of Tweets: A Feasibility Study

2019-11-26Unverified0· sign in to hype

Xiaoyi Zhang, Rodoniki Athanasiadou, Narges Razavian

Unverified — Be the first to reproduce this paper.

Abstract

Twitter data has been shown broadly applicable for public health surveillance. Previous public health studies based on Twitter data have largely relied on keyword-matching or topic models for clustering relevant tweets. However, both methods suffer from the short-length of texts and unpredictable noise that naturally occurs in user-generated contexts. In response, we introduce a deep learning approach that uses hashtags as a form of supervision and learns tweet embeddings for extracting informative textual features. In this case study, we address the specific task of estimating state-level obesity from dietary-related textual features. Our approach yields an estimation that strongly correlates the textual features to government data and outperforms the keyword-matching baseline. The results also demonstrate the potential of discovering risk factors using the textual features. This method is general-purpose and can be applied to a wide range of Twitter-based public health studies.

Tasks

Clustering Sentence Sentence Embeddings Topic Models

Tracing State-Level Obesity Prevalence from Sentence Embeddings of Tweets: A Feasibility Study

Abstract

Tasks

Reproductions