On undesired emergent behaviors in compound prostate cancer detection systems
Erlend Sortland Rolfsnes, Philip Thangngat, Trygve Eftestøl, Tobias Nordström, Fredrik Jäderling, Martin Eklund, Alvaro Fernandez-Quilez
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Artificial intelligence systems show promise to aid in the di- agnostic pathway of prostate cancer (PC), by supporting radiologists in interpreting magnetic resonance images (MRI) of the prostate. Most MRI-based systems are designed to detect clinically significant PC le- sions, with the main objective of preventing over-diagnosis. Typically, these systems involve an automatic prostate segmentation component and a clinically significant PC lesion detection component. In spite of the compound nature of the systems, evaluations are presented assum- ing a standalone clinically significant PC detection component. That is, they are evaluated in an idealized scenario and under the assumption that a highly accurate prostate segmentation is available at test time. In this work, we aim to evaluate a clinically significant PC lesion de- tection system accounting for its compound nature. For that purpose, we simulate a realistic deployment scenario and evaluate the effect of two non-ideal and previously validated prostate segmentation modules on the PC detection ability of the compound system. Following, we com- pare them with an idealized setting, where prostate segmentations are assumed to have no faults. We observe significant differences in the de- tection ability of the compound system in a realistic scenario and in the presence of the highest-performing prostate segmentation module (DSC: 90.07+-0.74), when compared to the idealized one (AUC: 77.93 +- 3.06 and 84.30+- 4.07, P<.001). Our results depict the relevance of holistic evalu- ations for PC detection compound systems, where interactions between system components can lead to decreased performance and degradation at deployment time.