SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 601650 of 1356 papers

TitleStatusHype
Fast DistilBERT on CPUs0
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language ModelsCode0
Online Cross-Layer Knowledge Distillation on Graph Neural Networks with Deep Supervision0
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models0
Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling0
Sub-network Multi-objective Evolutionary Algorithm for Filter Pruning0
Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices0
FIT: A Metric for Model Sensitivity0
Parameter-Efficient Masking NetworksCode1
Boosting Graph Neural Networks via Adaptive Knowledge Distillation0
SeKron: A Decomposition Method Supporting Many Factorization Structures0
Deep learning model compression using network sensitivity and gradients0
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models0
Less is More: Task-aware Layer-wise Distillation for Language Model CompressionCode1
Basic Binary Convolution Unit for Binarized Image Restoration NetworkCode1
Knowledge Distillation with Reptile Meta-Learning for Pretrained Language Model CompressionCode0
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition0
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio0
Attacking Compressed Vision TransformersCode0
Efficient On-Device Session-Based RecommendationCode1
On-Device Domain GeneralizationCode2
Analysis of Quantization on MLP-based Vision Models0
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision TransformersCode1
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model GeneralizationCode1
Towards Sparsification of Graph Neural NetworksCode0
SaleNet: A low-power end-to-end CNN accelerator for sustained attention level evaluation using EEG0
Lottery Aware Sparsity Hunting: Enabling Federated Learning on Resource-Limited EdgeCode0
Complexity-Driven CNN Compression for Resource-constrained Edge AI0
Reducing Computational Complexity of Neural Networks in Optical Channel Equalization: From Concepts to Implementation0
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and PruningCode1
Robust and Large-Payload DNN Watermarking via Fixed, Distribution-Optimized, WeightsCode0
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey0
Enhancing Targeted Attack Transferability via Diversified Weight Pruning0
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers0
Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software DeploymentCode0
Triple Sparsification of Graph Convolutional Networks without Sacrificing the Accuracy0
Model Blending for Text Classification0
Quiver neural networks0
Efficient model compression with Random Operation Access Specific Tile (ROAST) hashingCode0
Model Compression for Resource-Constrained Mobile Robots0
Towards Lightweight Super-Resolution with Dual Regression LearningCode2
Communication-Efficient Diffusion Strategy for Performance Improvement of Federated Learning with Non-IID DataCode1
T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit0
Normalized Feature Distillation for Semantic Segmentation0
3DG-STFM: 3D Geometric Guided Student-Teacher Feature MatchingCode1
Rank-Based Filter Pruning for Real-Time UAV Tracking0
CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN ExecutionCode1
Quantum Neural Network Compression0
KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation0
PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early ExitingCode0
Show:102550
← PrevPage 13 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified