SOTAVerified

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

2023-04-27Code Available1· sign in to hype

Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
MIMIC-IV-ICD-10-fullCAMLMacro-AUC89.91Unverified
MIMIC-IV-ICD-10-fullPLMMacro-AUC91.85Unverified
MIMIC-IV-ICD-10-fullLAATMacro-AUC92.96Unverified
MIMIC-IV-ICD-10-fullJoint LAATMacro-AUC93.64Unverified
MIMIC-IV-ICD-10-fullMSMNMacro-AUC97.07Unverified
MIMIC-IV-ICD10-top50MSMNF1 (micro)74.15Unverified
MIMIC-IV-ICD10-top50PLM-ICDF1 (micro)73.27Unverified
MIMIC-IV-ICD10-top50Joint LAATF1 (micro)72.85Unverified
MIMIC-IV-ICD10-top50LAATF1 (micro)72.56Unverified
MIMIC-IV-ICD10-top50CAMLF1 (micro)67.56Unverified
MIMIC-IV-ICD9-fullCAMLMacro AUC93.45Unverified
MIMIC-IV-ICD9-fullLAATMacro AUC95.18Unverified
MIMIC-IV-ICD9-fullJoint LAATMacro AUC95.57Unverified
MIMIC-IV-ICD9-fullPLM-ICDMacro AUC96.61Unverified
MIMIC-IV-ICD9-fullMSMNMacro AUC96.79Unverified
MIMIC-IV-ICD9-top50MSMNAUC Macro95.13Unverified
MIMIC-IV-ICD9-top50PLM-ICDAUC Macro94.97Unverified
MIMIC-IV-ICD9-top50Joint LAATAUC Macro94.92Unverified
MIMIC-IV-ICD9-top50LAATAUC Macro94.88Unverified
MIMIC-IV-ICD9-top50CAMLAUC Macro93.07Unverified

Reproductions