SOTAVerified

Can a calibration metric be both testable and actionable?

2025-02-27Code Available0· sign in to hype

Raphael Rossellini, Jake A. Soloff, Rina Foygel Barber, Zhimei Ren, Rebecca Willett

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Forecast probabilities often serve as critical inputs for binary decision making. In such settings, calibrationx2014ensuring forecasted probabilities match empirical frequenciesx2014is essential. Although the common notion of Expected Calibration Error (ECE) provides actionable insights for decision making, it is not testable: it cannot be empirically estimated in many practical cases. Conversely, the recently proposed Distance from Calibration (dCE) is testable but is not actionable since it lacks decision-theoretic guarantees needed for high-stakes applications. We introduce Cutoff Calibration Error, a calibration measure that bridges this gap by assessing calibration over intervals of forecasted probabilities. We show that Cutoff Calibration Error is both testable and actionable and examine its implications for popular post-hoc calibration methods, such as isotonic regression and Platt scaling.

Tasks

Reproductions