SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar, Dan Alistarh
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ist-daslab/sparsegptOfficialIn paperpytorch★ 877
- github.com/nvidia/tensorrt-model-optimizerpytorch★ 2,222
- github.com/nvlabs/maskllmpytorch★ 187
- github.com/baithebest/adagppytorch★ 67
- github.com/baithebest/sparsellmpytorch★ 67
- github.com/eth-easl/deltazippytorch★ 35
Abstract
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| arc_challenge | OPT-175B | Accuracy | 43.94 | — | Unverified |
| arc_challenge | OPT-175B (50% Sparsity) | Accuracy | 25.6 | — | Unverified |
| arc_challenge | SparseGPT (175B, 2:4 Sparsity) | Accuracy | 38.99 | — | Unverified |
| arc_challenge | SparseGPT (175B, 4:8 Sparsity) | Accuracy | 39.85 | — | Unverified |
| arc_challenge | SparseGPT (175B, 50% Sparsity) | Accuracy | 41.3 | — | Unverified |
| arc_easy | SparseGPT 175B (2:4 sparsity) | Accuracy | 67.08 | — | Unverified |
| arc_easy | OPT 175B (50% Sparsity) | Accuracy | 28.03 | — | Unverified |
| arc_easy | OPT-175B | Accuracy | 71.04 | — | Unverified |
| arc_easy | SparseGPT (175B, 4:8 Sparsity) | Accuracy | 68.35 | — | Unverified |
| arc_easy | SparseGPT 175B (50% sparsity) | Accuracy | 69.65 | — | Unverified |