SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 51100 of 1356 papers

TitleStatusHype
Knowledge Distillation with Refined LogitsCode1
Composable Interventions for Language ModelsCode1
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer MergingCode1
LiteYOLO-ID: A Lightweight Object Detection Network for Insulator Defect DetectionCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Transferable and Principled Efficiency for Open-Vocabulary SegmentationCode1
Streamlining Redundant Layers to Compress Large Language ModelsCode1
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task AdaptationCode1
Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingCode1
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel ApproachCode1
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt TuningCode1
Fast Vocabulary Transfer for Language Model CompressionCode1
Faster and Lighter LLMs: A Survey on Current Challenges and Way ForwardCode1
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side DistillationCode1
Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded DevicesCode1
Retraining-free Model Quantization via One-Shot Weight-Coupling LearningCode1
Generative Model-based Feature Knowledge Distillation for Action RecognitionCode1
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language ModelsCode1
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model FinetuningCode1
An Empirical Study of CLIP for Text-based Person SearchCode1
Accurate Retraining-free Pruning for Pretrained Encoder-based Language ModelsCode1
Quantization Variation: A New Perspective on Training Transformers with Low-Bit PrecisionCode1
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer InferenceCode1
CrossKD: Cross-Head Knowledge Distillation for Object DetectionCode1
HiNeRV: Video Compression with Hierarchical Encoding-based Neural RepresentationCode1
Efficient and Robust Quantization-aware Training via Adaptive Coreset SelectionCode1
MobileNMT: Enabling Translation in 15MB and 30msCode1
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-TuningCode1
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision ModelsCode1
An Efficient Multilingual Language Model Compression through Vocabulary TrimmingCode1
AD-KD: Attribution-Driven Knowledge Distillation for Language Model CompressionCode1
Class Attention Transfer Based Knowledge DistillationCode1
Performance-aware Approximation of Global Channel Pruning for Multitask CNNsCode1
The Tiny Time-series Transformer: Low-latency High-throughput Classification of Astronomical Transients using Deep Model CompressionCode1
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and UnderstandingCode1
Dual Relation Knowledge Distillation for Object DetectionCode1
UPop: Unified and Progressive Pruning for Compressing Vision-Language TransformersCode1
Compression-Aware Video Super-ResolutionCode1
FFNeRV: Flow-Guided Frame-Wise Neural Representations for VideosCode1
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision TransformersCode1
FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street ViewsCode1
Discovering Dynamic Patterns from Spatiotemporal Data with Time-Varying Low-Rank AutoregressionCode1
Unbiased Knowledge Distillation for RecommendationCode1
Sparse Probabilistic Circuits via Pruning and GrowingCode1
Parameter-Efficient Masking NetworksCode1
Less is More: Task-aware Layer-wise Distillation for Language Model CompressionCode1
Basic Binary Convolution Unit for Binarized Image Restoration NetworkCode1
Efficient On-Device Session-Based RecommendationCode1
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision TransformersCode1
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model GeneralizationCode1
Show:102550
← PrevPage 2 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified