Efficient 1-bit tensor approximations

2024-10-02Unverified0· sign in to hype

Alex W. Neal Riasanovsky, Sarah El Kazdadi

Unverified — Be the first to reproduce this paper.

Abstract

We present a spatially efficient decomposition of matrices and arbitrary-order tensors as linear combinations of tensor products of \-1, 1\-valued vectors. For any matrix A R^m n, is a w-width signed cut decomposition of A. Here C_w = "diag"(c_w) for some c_w R^w, and S_w, T_w, and the vectors s_j, t_j are \-1, 1\-valued. To store (S_w, T_w, C_w), we may pack w (m + n) bits, and require only w floating point numbers. As a function of w, \|R_w\|_F exhibits exponential decay when applied to #f32 matrices with i.i.d. N (0, 1) entries. Choosing w so that (S_w, T_w, C_w) has the same memory footprint as a f16 or bf16 matrix, the relative error is comparable. Our algorithm yields efficient signed cut decompositions in 20 lines of pseudocode. It reflects a simple modification from a celebrated 1999 paper [1] of Frieze and Kannan. As a first application, we approximate the weight matrices in the open Mistral-7B-v0.1 Large Language Model to a 50\% spatial compression. Remarkably, all 226 remainder matrices have a relative error <6\% and the expanded model closely matches Mistral-7B-v0.1 on the huggingface leaderboard [2]. Benchmark performance degrades slowly as we reduce the spatial compression from 50\% to 25\%. We optimize our open source rust implementation [3] with simd instructions on avx2 and avx512 architectures. We also extend our algorithm from matrices to tensors of arbitrary order and use it to compress a picture of the first author's cat Angus.

Tasks

Large Language Model

Efficient 1-bit tensor approximations

Abstract

Tasks

Reproductions