SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 151200 of 1356 papers

TitleStatusHype
Model LEGO: Creating Models Like Disassembling and Assembling Building BlocksCode1
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision ModelsCode1
Class Attention Transfer Based Knowledge DistillationCode1
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture SearchCode1
Aligned Structured Sparsity Learning for Efficient Image Super-ResolutionCode1
CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN ExecutionCode1
Clustered Sampling: Low-Variance and Improved Representativity for Clients Selection in Federated LearningCode1
LLMCBench: Benchmarking Large Language Model Compression for Efficient DeploymentCode1
Communication-Computation Trade-Off in Resource-Constrained Edge InferenceCode1
Implicit Regularization via Neural Feature AlignmentCode1
HiNeRV: Video Compression with Hierarchical Encoding-based Neural RepresentationCode1
DE-RRD: A Knowledge Distillation Framework for Recommender SystemCode1
Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing SystemsCode1
Memory-Efficient Backpropagation through Large Linear LayersCode1
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model GeneralizationCode1
MicroNet for Efficient Language ModelingCode1
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language UnderstandingCode1
Differentiable Model Compression via Pseudo Quantization NoiseCode1
Communication-Efficient Diffusion Strategy for Performance Improvement of Federated Learning with Non-IID DataCode1
Distilled Split Deep Neural Networks for Edge-Assisted Real-Time SystemsCode1
Activation-Informed Merging of Large Language ModelsCode1
A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution RobustnessCode1
Discrimination-aware Network Pruning for Deep Model CompressionCode1
Backdoor Attacks on Federated Learning with Lottery Ticket HypothesisCode1
CHEX: CHannel EXploration for CNN Model CompressionCode1
Distilling Linguistic Context for Language Model CompressionCode1
Basic Binary Convolution Unit for Binarized Image Restoration NetworkCode1
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model CompressionCode1
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and QuantizationCode1
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and TransformersCode1
Dynamic Channel Pruning: Feature Boosting and SuppressionCode1
Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded DevicesCode1
Streamlining Redundant Layers to Compress Large Language ModelsCode1
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's DistanceCode1
BERT-of-Theseus: Compressing BERT by Progressive Module ReplacingCode1
EarlyBERT: Efficient BERT Training via Early-bird Lottery TicketsCode1
Efficient and Robust Quantization-aware Training via Adaptive Coreset SelectionCode1
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and BetterCode1
Efficient On-Device Session-Based RecommendationCode1
Bidirectional Distillation for Top-K Recommender SystemCode1
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product OperatorsCode1
Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained DevicesCode1
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-TuningCode1
Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingCode1
Hyper-Compression: Model Compression via HyperfunctionCode1
Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank DeterminationCode1
Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient DetectorsCode1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical MappingCode1
Knowledge Distillation Meets Self-SupervisionCode1
Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse NetworkCode1
Show:102550
← PrevPage 4 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified