SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 251300 of 1356 papers

TitleStatusHype
Aerial Image Classification in Scarce and Unconstrained Environments via Conformal Prediction0
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs0
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMsCode0
D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving0
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning0
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-DesignCode0
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models0
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model CompressionCode0
Compression Laws for Large Language Models0
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation0
Compositionality Unlocks Deep Interpretable Models0
Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression0
Penrose Tiled Low-Rank Compression and Section-Wise Q&A Fine-Tuning: A General Framework for Domain-Specific Large Language Model Adaptation0
Multi-Task Semantic Communications via Large Models0
Delving Deep into Semantic Relation Distillation0
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness0
Boosting Large Language Models with Mask Fine-TuningCode0
Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration0
A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices0
Temporal Action Detection Model Compression by Progressive Block Drop0
Large Language Model Compression via the Nested Activation-Aware Decomposition0
InhibiDistilbert: Knowledge Distillation for a ReLU and Addition-based Transformer0
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting0
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning0
Fragile Mastery: Are Domain-Specific Trade-Offs Undermining On-Device Language Models?0
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices0
Towards Superior Quantization Accuracy: A Layer-sensitive Approach0
IteRABRe: Iterative Recovery-Aided Block Reduction0
ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation0
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models0
CASP: Compression of Large Multimodal Models Based on Attention SparsityCode0
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation0
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model CompressionCode0
10K is Enough: An Ultra-Lightweight Binarized Network for Infrared Small-Target Detection0
Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models0
Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies0
AfroXLMR-Comet: Multilingual Knowledge Distillation with Attention Matching for Low-Resource languages0
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?0
Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs0
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models0
Optimizing Singular Spectrum for Large Language Model Compression0
Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications0
Vision Foundation Models in Medical Image Analysis: Advances and Challenges0
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures0
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models0
OPTISHEAR: Towards Efficient and Adaptive Pruning of Large Language Models via Evolutionary Optimization0
Vision-Language Models for Edge Networks: A Comprehensive Survey0
Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs0
Show:102550
← PrevPage 6 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified