Network Pruning for Low-Rank Binary Index

2019-09-25Unverified0· sign in to hype

Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Parichay Kapoor, Gu-Yeon Wei

Unverified — Be the first to reproduce this paper.

Abstract

Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs). A critical problem to represent sparse matrices after pruning is that if fewer bits are used for quantization and pruning rate is enhanced, then the amount of index becomes relatively larger. Moreover, an irregular index form leads to low parallelism for convolutions and matrix multiplications. In this paper, we propose a new network pruning technique that generates a low-rank binary index matrix to compress index data significantly. Specifically, the proposed compression method finds a particular fine-grained pruning mask that can be decomposed into two binary matrices while decompressing index data is performed by simple binary matrix multiplication. We also propose a tile-based factorization technique that not only lowers memory requirements but also enhances compression ratio. Various DNN models (including conv layers and LSTM layers) can be pruned with much fewer indices compared to previous sparse matrix formats while maintaining the same pruning rate.

Tasks

Model Compression Network Pruning Quantization

Network Pruning for Low-Rank Binary Index

Abstract

Tasks

Reproductions