Power-Law Spectrum of the Random Feature Model

2026-03-15Unverified0· sign in to hype

Elliot Paquette, Ke Liang Xiao, Yizhe Zhu

Unverified — Be the first to reproduce this paper.

Abstract

Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data x N(0,H) R^v where H has α-power-law spectrum (λ_j(H ) j^-α, α> 1), a Gaussian sketch matrix W R^v d, and an entrywise monomial f(y) = y^p, we characterize the eigenvalues of the population random-feature covariance E_x [1df(W^ x )^ 2]. We prove matching upper and lower bounds: for all 1 j c_1 d ^-(p+1)(d), the j-th eigenvalue is of order (^p-1(j+1)/j)^α. For c_1 d ^-(p+1)(d) j d, the j-th eigenvalue is of order j^-α up to a polylog factor. That is, the power-law exponent α is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree p. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.

Power-Law Spectrum of the Random Feature Model

Abstract

Reproductions