An improved analysis of the ER-SpUD dictionary learning algorithm
Jarosław Błasiok, Jelani Nelson
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In "dictionary learning" we observe Y = AX + E for some YR^n p, A R^m n, and XR^m p. The matrix Y is observed, and A, X, E are unknown. Here E is "noise" of small norm, and X is column-wise sparse. The matrix A is referred to as a dictionary, and its columns as atoms. Then, given some small number p of samples, i.e.\ columns of Y, the goal is to learn the dictionary A up to small error, as well as X. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary A (e.g.\ images in the Haar wavelet basis), and the goal is to learn A from the data to then use it for other applications. Recently, [SWW12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when E = 0 and m = n. They showed if X has independent entries with an expected s non-zeroes per column for 1 s n, and with non-zero entries being subgaussian, then for p n^2^2 n with high probability ER-SpUD outputs matrices A', X' which equal A, X up to permuting and scaling columns (resp.\ rows) of A (resp.\ X). They conjectured p n n suffices, which they showed was information theoretically necessary for any algorithm to succeed when s 1. Significant progress was later obtained in [LV15]. We show that for a slight variant of ER-SpUD, p n(n/) samples suffice for successful recovery with probability 1-. We also show that for the unmodified ER-SpUD, p n^1.99 samples are required even to learn A, X with polynomially small success probability. This resolves the main conjecture of [SWW12], and contradicts the main result of [LV15], which claimed that p n^4 n guarantees success whp.