Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians
Jane H. Lee, Anay Mehrotra, Manolis Zampetakis
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set S R^d. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a d^poly(1/) time algorithm for finding -accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on 1/ is necessary even when S belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design poly(d/) time algorithms when S is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a d^poly(/) time algorithm for any exponential family that satisfies some structural assumptions and any unknown set S that is -approximable by degree- polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown S; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime poly(d/) that works for a set of exponential families (containing all Gaussians) when S is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.