The Foundational Capabilities of Large Language Models in Predicting Postoperative Risks Using Clinical Notes
Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/cja5553/LLMs_in_perioperative_careOfficialIn paperpytorch★ 5
Abstract
Clinical notes recorded during a patient's perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 pre-operative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care