Detecting Causal Language Use in Science Findings

2019-11-01IJCNLP 2019Unverified0· sign in to hype

Bei Yu, Yingya Li, Jun Wang

Unverified — Be the first to reproduce this paper.

Abstract

Causal interpretation of correlational findings from observational studies has been a major type of misinformation in science communication. Prior studies on identifying inappropriate use of causal language relied on manual content analysis, which is not scalable for examining a large volume of science publications. In this study, we first annotated a corpus of over 3,000 PubMed research conclusion sentences, then developed a BERT-based prediction model that classifies conclusion sentences into ``no relationship'', ``correlational'', ``conditional causal'', and ``direct causal'' categories, achieving an accuracy of 0.90 and a macro-F1 of 0.88. We then applied the prediction model to measure the causal language use in the research conclusions of about 38,000 observational studies in PubMed. The prediction result shows that 21.7\% studies used direct causal language exclusively in their conclusions, and 32.4\% used some direct causal language. We also found that the ratio of causal language use differs among authors from different countries, challenging the notion of a shared consensus on causal language use in the global science community. Our prediction model could also be used to help identify the inappropriate use of causal language in science publications.

Tasks

Misinformation Prediction

Detecting Causal Language Use in Science Findings

Abstract

Tasks

Reproductions