Sample-Efficient Linear Regression with Self-Selection Bias

2024-02-22Unverified0· sign in to hype

Jason Gaitonde, Elchanan Mossel

Unverified — Be the first to reproduce this paper.

Abstract

We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes m i.i.d. samples (x_,z_)_=1^m where z_=_i [k] _^Tw_i+_i,\, but the maximizing index i_ is unobserved. Here, the x_ are assumed to be N(0,I_n) and the noise distribution _ D is centered and independent of x_. We provide a novel and near optimally sample-efficient (in terms of k) algorithm to recover w_1,,w_k R^n up to additive _2-error with polynomial sample complexity O(n) poly(k,1/) and significantly improved time complexity poly(n,k,1/)+O((k)/)^O(k). When k=O(1), our algorithm runs in poly(n,1/) time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for k=2 and when it is known that D=N(0,I_k). Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of k than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small , and leads to improved algorithms for any by providing a warm start for existing local convergence methods.

Tasks

regression Selection bias

Sample-Efficient Linear Regression with Self-Selection Bias

Abstract

Tasks

Reproductions