SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 981990 of 1356 papers

TitleStatusHype
PFGDF: Pruning Filter via Gaussian Distribution Feature for Deep Neural Networks Acceleration0
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression0
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models0
A Low Effort Approach to Structured CNN Design Using PCA0
Do we need Label Regularization to Fine-tune Pre-trained Language Models?0
Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs0
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification0
Towards Zero-Shot Knowledge Distillation for Natural Language Processing0
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Show:102550
← PrevPage 99 of 136Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified