Kernel Two-Sample Tests for Manifold Data
Xiuyuan Cheng, Yao Xie
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/xycheng/manifold_mmdOfficialIn papernone★ 0
Abstract
We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, when data densities p and q are supported on a d-dimensional sub-manifold M embedded in an m-dimensional space and are H\"older with order (up to 2) on M, we prove a guarantee of the test power for finite sample size n that exceeds a threshold depending on d, , and _2 the squared L^2-divergence between p and q on the manifold, and with a properly chosen kernel bandwidth . For small density departures, we show that with large n they can be detected by the kernel test when _2 is greater than n^- 2 /( d + 4 ) up to a certain constant and scales as n^-1/(d+4). The analysis extends to cases where the manifold has a boundary and the data samples contain high-dimensional additive noise. Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold. We validate our theory and the properties of the kernel test for manifold data through a series of numerical experiments.