Sample-Optimal Density Estimation in Nearly-Linear Time
Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are well approximated by piecewise polynomial functions. Let f be the density function of an arbitrary univariate distribution, and suppose that f is OPT-close in L_1-distance to an unknown piecewise polynomial function with t interval pieces and degree d. Our algorithm draws n = O(t(d+1)/^2) samples from f, runs in time O(n poly(d)), and with probability at least 9/10 outputs an O(t)-piecewise degree-d hypothesis h that is 4 OPT + close to f. Our general algorithm yields (nearly) sample-optimal and nearly-linear time estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the first sample-optimal and nearly-linear time estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference tasks via a single "meta-algorithm". Moreover, we experimentally demonstrate that our algorithm performs very well in practice. Our algorithm consists of three "levels": (i) At the top level, we employ an iterative greedy algorithm for finding a good partition of the real line into the pieces of a piecewise polynomial. (ii) For each piece, we show that the sub-problem of finding a good polynomial fit on the current interval can be solved efficiently with a separation oracle method. (iii) We reduce the task of finding a separating hyperplane to a combinatorial problem and give an efficient algorithm for this problem. Combining these three procedures gives a density estimation algorithm with the claimed guarantees.