SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 9511000 of 1356 papers

TitleStatusHype
OPTISHEAR: Towards Efficient and Adaptive Pruning of Large Language Models via Evolutionary Optimization0
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models0
A Memory-Efficient Learning Framework for SymbolLevel Precoding with Quantized NN Weights0
OTOV2: Automatic, Generic, User-Friendly0
Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling0
Towards Higher Ranks via Adversarial Weight Pruning0
Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network0
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs0
Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory0
AMD: Automatic Multi-step Distillation of Large-scale Vision Models0
Single-path Bit Sharing for Automatic Loss-aware Model Compression0
Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression0
AMD: Adaptive Masked Distillation for Object Detection0
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning0
Towards Modality Transferable Visual Information Representation with Optimal Model Compression0
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models0
PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting0
Towards Optimal Compression: Joint Pruning and Quantization0
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation0
PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators0
PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices0
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT0
Pea-KD: Parameter-efficient and accurate Knowledge Distillation0
Weight Squeezing: Reparameterization for Compression and Fast Inference0
Penrose Tiled Low-Rank Compression and Section-Wise Q&A Fine-Tuning: A General Framework for Domain-Specific Large Language Model Adaptation0
Towards Superior Quantization Accuracy: A Layer-sensitive Approach0
A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices0
Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs0
PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices0
Perturbation of Deep Autoencoder Weights for Model Compression and Classification of Tabular Data0
PFGDF: Pruning Filter via Gaussian Distribution Feature for Deep Neural Networks Acceleration0
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression0
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models0
A Low Effort Approach to Structured CNN Design Using PCA0
Do we need Label Regularization to Fine-tune Pre-trained Language Models?0
Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs0
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification0
Towards Zero-Shot Knowledge Distillation for Natural Language Processing0
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Aligned Weight Regularizers for Pruning Pretrained Neural Networks0
Post-Training Quantization for Video Matting0
Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs0
Post-Training Weighted Quantization of Neural Networks for Language Models0
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation0
Practical quantum federated learning and its experimental demonstration0
Precise Box Score: Extract More Information from Datasets to Improve the Performance of Face Detection0
What do larger image classifiers memorise?0
Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data0
Preview-based Category Contrastive Learning for Knowledge Distillation0
Show:102550
← PrevPage 20 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified