Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks
Hengyu Fu, ZiHao Wang, Eshaan Nichani, Jason D. Lee
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In deep learning theory, a critical question is to understand how neural networks learn hierarchical features. In this work, we study the learning of hierarchical polynomials of multiple nonlinear features using three-layer neural networks. We examine a broad class of functions of the form f^=g^ , where :R^d R^r represents multiple quadratic features with r d and g^:R^r R is a polynomial of degree p. This can be viewed as a nonlinear generalization of the multi-index model damian2022neural, and also an expansion upon previous work that focused only on a single nonlinear feature, i.e. r = 1 nichani2023provable,wang2023learning. Our primary contribution shows that a three-layer neural network trained via layerwise gradient descent suffices for itemize complete recovery of the space spanned by the nonlinear features efficient learning of the target function f^=g^ or transfer learning of f=g with a different link function itemize within (d^4) samples and polynomial time. For such hierarchical targets, our result substantially improves the sample complexity (d^2p) of the kernel methods, demonstrating the power of efficient feature learning. It is important to highlight that our results leverage novel techniques and thus manage to go beyond all prior settings such as single-index and multi-index models as well as models depending just on one nonlinear feature, contributing to a more comprehensive understanding of feature learning in deep learning.