SOTAVerified

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

2023-01-02Code Available4· sign in to hype

Elias Frantar, Dan Alistarh

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
arc_challengeOPT-175BAccuracy43.94Unverified
arc_challengeOPT-175B (50% Sparsity)Accuracy25.6Unverified
arc_challengeSparseGPT (175B, 2:4 Sparsity)Accuracy38.99Unverified
arc_challengeSparseGPT (175B, 4:8 Sparsity)Accuracy39.85Unverified
arc_challengeSparseGPT (175B, 50% Sparsity)Accuracy41.3Unverified
arc_easySparseGPT 175B (2:4 sparsity)Accuracy67.08Unverified
arc_easyOPT 175B (50% Sparsity)Accuracy28.03Unverified
arc_easyOPT-175BAccuracy71.04Unverified
arc_easySparseGPT (175B, 4:8 Sparsity)Accuracy68.35Unverified
arc_easySparseGPT 175B (50% sparsity)Accuracy69.65Unverified

Reproductions