SOTAVerified

Selective inference for k-means clustering

2022-03-29Code Available0· sign in to hype

Yiqun T. Chen, Daniela M. Witten

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we take a selective inference approach. We propose a finite-sample p-value that controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering, and show that it can be efficiently computed. We apply our proposal in simulation, and on hand-written digits data and single-cell RNA-sequencing data.

Tasks

Reproductions