SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 251260 of 1356 papers

TitleStatusHype
Aerial Image Classification in Scarce and Unconstrained Environments via Conformal Prediction0
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs0
D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving0
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMsCode0
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning0
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-DesignCode0
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models0
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model CompressionCode0
Compression Laws for Large Language Models0
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation0
Show:102550
← PrevPage 26 of 136Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified