SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 301350 of 1356 papers

TitleStatusHype
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization0
BioNetExplorer: Architecture-Space Exploration of Bio-Signal Processing Deep Neural Networks for Wearables0
An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation0
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting0
An Effective Information Theoretic Framework for Channel Pruning0
Distilling with Performance Enhanced Students0
Distributed Low Precision Training Without Mixed Precision0
DKM: Differentiable K-Means Clustering Layer for Neural Network Compression0
DMT: Comprehensive Distillation with Multiple Self-supervised Teachers0
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures0
An Automatic and Efficient BERT Pruning for Edge AI Systems0
Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models0
Analysis of Quantization on MLP-based Vision Models0
AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for Enabling Ubiquitous Intelligent Mobiles0
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments0
Beware of Calibration Data for Pruning Large Language Models0
Analysis of memory consumption by neural networks based on hyperparameters0
Benchmarking Adversarial Robustness of Compressed Deep Learning Models0
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers0
ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation0
BD-KD: Balancing the Divergences for Online Knowledge Distillation0
An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design0
Activation Sparsity Opportunities for Compressing General Large Language Models0
Bayesian Federated Model Compression for Communication and Computation Efficiency0
Bayesian Deep Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing0
A Model Compression Method with Matrix Product Operators for Speech Enhancement0
A Mixed Integer Programming Approach for Verifying Properties of Binarized Neural Networks0
Balancing Specialization, Generalization, and Compression for Detection and Tracking0
Balancing Cost and Benefit with Tied-Multi Transformers0
Activation Map Adaptation for Effective Knowledge Distillation0
Single-path Bit Sharing for Automatic Loss-aware Model Compression0
Distilling Inductive Bias: Knowledge Distillation Beyond Model Compression0
Extending DeepSDF for automatic 3D shape retrieval and similarity transform estimation0
A Memory-Efficient Learning Framework for SymbolLevel Precoding with Quantized NN Weights0
AMD: Automatic Multi-step Distillation of Large-scale Vision Models0
Deep Model Compression Via Two-Stage Deep Reinforcement Learning0
Deep Model Compression: Distilling Knowledge from Noisy Teachers0
Deep Model Compression based on the Training History0
A Web-Based Solution for Federated Learning with LLM-Based Automation0
AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates0
Deep learning model compression using network sensitivity and gradients0
AMD: Adaptive Masked Distillation for Object Detection0
DEEPEYE: A Compact and Accurate Video Comprehension at Terminal Devices Compressed with Quantization and Tensorization0
Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks0
Discrete Model Compression With Resource Constraint for Deep Neural Networks0
Neural Epitome Search for Architecture-Agnostic Network Compression0
AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent0
DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices0
Automatic Mixed-Precision Quantization Search of BERT0
Deep Compression of Neural Networks for Fault Detection on Tennessee Eastman Chemical Processes0
Show:102550
← PrevPage 7 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified