SOTAVerified

Normalized mutual information is a biased measure for classification and community detection

2023-07-03Code Available0· sign in to hype

Maximilian Jerdee, Alec Kirkley, M. E. J. Newman

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.

Tasks

Reproductions