On the Dichotomy Between Privacy and Traceability in _p Stochastic Convex Optimization
Sasha Voitovych, Mahdi Haghifam, Idan Attias, Gintare Karolina Dziugaite, Roi Livni, Daniel M. Roy
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under _p geometries. Informally, we say a learning algorithm memorizes m samples (or is m-traceable) if, by analyzing its output, it is possible to identify at least m of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every p [1,), we establish the existence of a risk threshold below which any sample-efficient learner must memorize a constant fraction of its sample. For p [1,2], this threshold coincides with best risk of differentially private (DP) algorithms, i.e., above this threshold, there are algorithms that do not memorize even a single sample. This establishes a sharp dichotomy between privacy and traceability for p [1,2]. For p (2,), this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route of proving these results, we introduce a complexity notion we term trace value of a problem, which unifies privacy lower bounds and traceability results, and prove a sparse variant of the fingerprinting lemma.