Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study
Gregory Holste, Song Wang, Ziyu Jiang, Thomas C. Shen, George Shih, Ronald M. Summers, Yifan Peng, Zhangyang Wang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/vita-group/longtailcxrOfficialIn paperpytorch★ 37
Abstract
Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a "long-tailed" distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common "head" classes, but also the rare yet critical "tail" classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| MIMIC-CXR-LT | LDAM | Balanced Accuracy | 0.17 | — | Unverified |
| MIMIC-CXR-LT | Decoupling (cRT) | Balanced Accuracy | 0.3 | — | Unverified |
| MIMIC-CXR-LT | Reweighted LDAM-DRW | Balanced Accuracy | 0.28 | — | Unverified |
| MIMIC-CXR-LT | Class-balanced LDAM-DRW | Balanced Accuracy | 0.27 | — | Unverified |
| MIMIC-CXR-LT | Reweighted LDAM | Balanced Accuracy | 0.24 | — | Unverified |
| MIMIC-CXR-LT | Reweighted Focal Loss | Balanced Accuracy | 0.24 | — | Unverified |
| MIMIC-CXR-LT | Decoupling (tau-norm) | Balanced Accuracy | 0.23 | — | Unverified |
| MIMIC-CXR-LT | Class-balanced Softmax | Balanced Accuracy | 0.23 | — | Unverified |
| MIMIC-CXR-LT | Class-balanced LDAM | Balanced Accuracy | 0.23 | — | Unverified |
| MIMIC-CXR-LT | Reweighted Softmax | Balanced Accuracy | 0.21 | — | Unverified |
| MIMIC-CXR-LT | Class-balanced Focal Loss | Balanced Accuracy | 0.19 | — | Unverified |
| MIMIC-CXR-LT | MixUp | Balanced Accuracy | 0.18 | — | Unverified |
| MIMIC-CXR-LT | Focal Loss | Balanced Accuracy | 0.17 | — | Unverified |
| MIMIC-CXR-LT | Softmax | Balanced Accuracy | 0.17 | — | Unverified |
| MIMIC-CXR-LT | Balanced-MixUp | Balanced Accuracy | 0.17 | — | Unverified |
| NIH-CXR-LT | Decoupling (cRT) | Balanced Accuracy | 0.29 | — | Unverified |
| NIH-CXR-LT | Reweighted LDAM-DRW | Balanced Accuracy | 0.29 | — | Unverified |
| NIH-CXR-LT | Class-balanced LDAM-DRW | Balanced Accuracy | 0.28 | — | Unverified |
| NIH-CXR-LT | Reweighted LDAM | Balanced Accuracy | 0.28 | — | Unverified |
| NIH-CXR-LT | Class-Balanced Softmax | Balanced Accuracy | 0.27 | — | Unverified |
| NIH-CXR-LT | Reweighted Softmax | Balanced Accuracy | 0.26 | — | Unverified |
| NIH-CXR-LT | Class-balanced LDAM | Balanced Accuracy | 0.24 | — | Unverified |
| NIH-CXR-LT | Class-Balanced Focal Loss | Balanced Accuracy | 0.23 | — | Unverified |
| NIH-CXR-LT | Decoupling (tau-norm) | Balanced Accuracy | 0.21 | — | Unverified |
| NIH-CXR-LT | Reweighted Focal Loss | Balanced Accuracy | 0.2 | — | Unverified |
| NIH-CXR-LT | LDAM | Balanced Accuracy | 0.18 | — | Unverified |
| NIH-CXR-LT | Balanced-MixUp | Balanced Accuracy | 0.16 | — | Unverified |
| NIH-CXR-LT | Focal Loss | Balanced Accuracy | 0.12 | — | Unverified |
| NIH-CXR-LT | MixUp | Balanced Accuracy | 0.12 | — | Unverified |
| NIH-CXR-LT | Softmax | Balanced Accuracy | 0.12 | — | Unverified |