SOTAVerified

Master your Metrics with Calibration

2019-09-06Code Available0· sign in to hype

Wissam Siblini, Jordan Fréry, Liyun He-Guelton, Frédéric Oblé, Yi-Qing Wang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics make it difficult to interpret the variation of a model's performance over different subpopulations/subperiods in a dataset. In this paper, we propose a way to calibrate the metrics so that they can be made invariant to the prior. We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.

Tasks

Reproductions