Quantized sparse PCA for neural network weight compression

2021-09-29Unverified0· sign in to hype

Andrey Kuzmin, Mart van Baalen, Markus Nagel, Arash Behboodi

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we introduce a novel method of weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weight tensors. The underlying matrix factorization problem can be considered as a quantized sparse PCA problem and solved through iterative projected gradient descent methods. Seen as a unification of weight SVD, vector quantization and sparse PCA, our method achieves or is on par with state-of-the-art trade-offs between accuracy and model size. Our method is applicable to both moderate compression regime, unlike vector quantization, and extreme compression regime.

Tasks

Quantization

Quantized sparse PCA for neural network weight compression

Abstract

Tasks

Reproductions