Normalized mutual information is a biased measure for classification and community detection
Maximilian Jerdee, Alec Kirkley, M. E. J. Newman
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/maxjerdee/reduced_mutual_informationOfficialIn papernone★ 2
Abstract
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.