SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 701725 of 1356 papers

TitleStatusHype
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution0
A Short Study on Compressing Decoder-Based Language Models0
Representative Teacher Keys for Knowledge Distillation Model Compression Based on Attention Mechanism for Image Classification0
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?0
Accelerating Machine Learning Primitives on Commodity Hardware0
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization0
Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks0
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads0
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation0
KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation0
Kronecker Decomposition for GPT Compression0
L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models0
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression0
Language model compression with weighted low-rank factorization0
The Potential of AutoML for Recommender Systems0
Large Language Model Compression via the Nested Activation-Aware Decomposition0
Wasserstein Contrastive Representation Distillation0
Large receptive field strategy and important feature extraction strategy in 3D object detection0
Large-Scale Generative Data-Free Distillation0
LatentLLM: Attention-Aware Joint Tensor Compression0
LayerCollapse: Adaptive compression of neural networks0
Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors0
LCQ: Low-Rank Codebook based Quantization for Large Language Models0
A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models0
A Scale Mixture Perspective of Multiplicative Noise in Neural Networks0
Show:102550
← PrevPage 29 of 55Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified