SOTAVerified

Optimal Identity Testing with High Probability

2017-08-09Unverified0· sign in to hype

Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< , < 1, we wish to distinguish, with probability at least 1-, whether the distributions are identical versus -far in total variation distance. Most prior work focused on the case that = (1), for which the sample complexity of identity testing is known to be (n/^2). Given such an algorithm, one can achieve arbitrarily small values of via black-box amplification, which multiplies the required number of samples by ((1/)). We show that black-box amplification is suboptimal for any = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is \[ ( 1 ^2 (n (1/ ) + (1/ ) ) ) \] for any n, , and . For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_ TV(p, U_n) , we simply threshold d_ TV(p, U_n), where p is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of and .

Tasks

Reproductions