SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 601650 of 1356 papers

TitleStatusHype
Greener yet Powerful: Taming Large Code Generation Models with Quantization0
Group channel pruning and spatial attention distilling for object detection0
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking0
Atrial Fibrillation Detection Using Weight-Pruned, Log-Quantised Convolutional Neural Networks0
Conditional Automated Channel Pruning for Deep Neural Networks0
HadaNets: Flexible Quantization Strategies for Neural Networks0
HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks0
A flexible, extensible software framework for model compression based on the LC algorithm0
HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble0
Investigation of Practical Aspects of Single Channel Speech Separation for ASR0
HFSP: A Hardware-friendly Soft Pruning Framework for Vision Transformers0
HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign Supermask0
Cross-Channel Intragroup Sparsity Neural Network0
Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings0
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval0
HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks0
Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations0
How and When Adversarial Robustness Transfers in Knowledge Distillation?0
Aerial Image Classification in Scarce and Unconstrained Environments via Conformal Prediction0
Deep Face Recognition Model Compression via Knowledge Transfer and Distillation0
How to Explain Neural Networks: an Approximation Perspective0
How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding0
Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations0
Redundancy and Concept Analysis for Code-trained Language Models0
CURing Large Models: Compression via CUR Decomposition0
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference0
SwiftPrune: Hessian-Free Weight Pruning for Large Language Models0
D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving0
FoldGPT: Simple and Effective Large Language Model Compression Scheme0
DARB: A Density-Aware Regular-Block Pruning for Deep Neural Networks0
ICD-Face: Intra-class Compactness Distillation for Face Recognition0
Identifying Sub-networks in Neural Networks via Functionally Similar Representations0
ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA0
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer0
FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks0
Impact of Disentanglement on Pruning Neural Networks0
Implicit Neural Representation for Videos Based on Residual Connection0
A Survey on Drowsiness Detection -- Modern Applications and Methods0
Computation-efficient Deep Learning for Computer Vision: A Survey0
A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation0
Improve Knowledge Distillation via Label Revision and Data Selection0
Interpreting Deep Classifier by Visual Distillation of Dark Knowledge0
Intrinsically Sparse Long Short-Term Memory Networks0
Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning0
Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing0
FlatENN: Train Flat for Enhanced Fault Tolerance of Quantized Deep Neural Networks0
FIT: A Metric for Model Sensitivity0
In defense of parameter sharing for model-compression0
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models0
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead0
Show:102550
← PrevPage 13 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified