SOTAVerified

A New Robust Partial p-Wasserstein-Based Metric for Comparing Distributions

2024-05-06Unverified0· sign in to hype

Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The 2-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the 2-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical 2-Wasserstein distance on n samples in R^2 to converge to the true distance at a rate of n^-1/4, which is significantly slower than the rate of n^-1/2 for 1-Wasserstein distance. We introduce a new family of distances parameterized by k 0, called k-RPW that is based on computing the partial 2-Wasserstein distance. We show that (1) k-RPW satisfies the metric properties, (2) k-RPW is robust to small outlier mass while retaining the sensitivity of 2-Wasserstein distance to minor geometric differences, and (3) when k is a constant, k-RPW distance between empirical distributions on n samples in R^2 converges to the true distance at a rate of n^-1/3, which is faster than the convergence rate of n^-1/4 for the 2-Wasserstein distance. Using the partial p-Wasserstein distance, we extend our distance to any p [1,]. By setting parameters k or p appropriately, we can reduce our distance to the total variation, p-Wasserstein, and the L\'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the 1-Wasserstein, 2-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

Tasks

Reproductions