SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 621630 of 1356 papers

TitleStatusHype
How to Explain Neural Networks: an Approximation Perspective0
How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding0
Aerial Image Classification in Scarce and Unconstrained Environments via Conformal Prediction0
Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations0
CURing Large Models: Compression via CUR Decomposition0
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference0
SwiftPrune: Hessian-Free Weight Pruning for Large Language Models0
D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving0
Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning0
FoldGPT: Simple and Effective Large Language Model Compression Scheme0
Show:102550
← PrevPage 63 of 136Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified