SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 576600 of 1356 papers

TitleStatusHype
Frustratingly Easy Model Ensemble for Abstractive Summarization0
From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models0
Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach0
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs0
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization0
GECKO: Reconciling Privacy, Accuracy and Efficiency in Embedded Deep Learning0
GeneCAI: Genetic Evolution for Acquiring Compact AI0
Conditional Teacher-Student Learning0
Conditional Generative Data-free Knowledge Distillation0
From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges0
A Survey on Green Deep Learning0
Convolutional Neural Network Compression Based on Low-Rank Decomposition0
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks0
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models0
Fragile Mastery: Are Domain-Specific Trade-Offs Undermining On-Device Language Models?0
Conditional Automated Channel Pruning for Deep Neural Networks0
A flexible, extensible software framework for model compression based on the LC algorithm0
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks0
HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks0
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval0
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference0
Gradient-Free Structured Pruning with Unlabeled Data0
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures0
Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks0
Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations0
Show:102550
← PrevPage 24 of 55Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified