SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 5175 of 1356 papers

TitleStatusHype
Knowledge Distillation with Refined LogitsCode1
Composable Interventions for Language ModelsCode1
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer MergingCode1
LiteYOLO-ID: A Lightweight Object Detection Network for Insulator Defect DetectionCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Transferable and Principled Efficiency for Open-Vocabulary SegmentationCode1
Streamlining Redundant Layers to Compress Large Language ModelsCode1
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task AdaptationCode1
Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingCode1
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel ApproachCode1
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt TuningCode1
Fast Vocabulary Transfer for Language Model CompressionCode1
Faster and Lighter LLMs: A Survey on Current Challenges and Way ForwardCode1
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side DistillationCode1
Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded DevicesCode1
Retraining-free Model Quantization via One-Shot Weight-Coupling LearningCode1
Generative Model-based Feature Knowledge Distillation for Action RecognitionCode1
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language ModelsCode1
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model FinetuningCode1
An Empirical Study of CLIP for Text-based Person SearchCode1
Accurate Retraining-free Pruning for Pretrained Encoder-based Language ModelsCode1
Quantization Variation: A New Perspective on Training Transformers with Low-Bit PrecisionCode1
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer InferenceCode1
CrossKD: Cross-Head Knowledge Distillation for Object DetectionCode1
HiNeRV: Video Compression with Hierarchical Encoding-based Neural RepresentationCode1
Show:102550
← PrevPage 3 of 55Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified