SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 4150 of 1356 papers

TitleStatusHype
Towards Faster and More Compact Foundation Models for Molecular Property PredictionCode0
Low-Rank Matrix Approximation for Neural Network Compression0
Aerial Image Classification in Scarce and Unconstrained Environments via Conformal Prediction0
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration0
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs0
D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving0
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMsCode0
Efficient Reasoning Models: A SurveyCode3
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning0
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-DesignCode0
Show:102550
← PrevPage 5 of 136Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified