When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

2024-07-03Code Available0· sign in to hype

Stanislav Budzinskiy

Code Available — Be the first to reproduce this paper.

Code

github.com/sbudzinskiy/low-rank-big-data
OfficialIn papernone★ 0

Abstract

The article concerns low-rank approximation of matrices generated by sampling a smooth function of two m-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such n n matrices admit accurate entrywise approximation of rank that is independent of m and grows as (n) -- colloquially known as ''big-data matrices are approximately low-rank''. We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which function-generated matrices can be approximated within an entrywise error of order with rank O((n) ^-2 (^-1)) that is independent of the dimension m: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the ''higher-order inner product'' of their multiple variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks.

Tasks

Misconceptions

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Code

Abstract

Tasks

Reproductions