SOTAVerified

Model Compression

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Papers

Showing 301350 of 1356 papers

TitleStatusHype
Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs0
Systematic Outliers in Large Language ModelsCode0
Synergistic Effects of Knowledge Distillation and Structured Pruning for Self-Supervised Speech Models0
Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks0
Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity0
MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks0
Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data0
Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings0
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference0
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models0
Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression0
Perforated Backpropagation: A Neuroscience Inspired Extension to Artificial Neural NetworksCode0
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models0
On Accelerating Edge AI: Optimizing Resource-Constrained Environments0
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning0
SwiftPrune: Hessian-Free Weight Pruning for Large Language Models0
Practical quantum federated learning and its experimental demonstration0
MultiPruner: Balanced Structure Removal in Foundation Models0
FASP: Fast and Accurate Structured Pruning of Large Language Models0
Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images0
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures0
SWSC: Shared Weight for Similar Channel in LLM0
Tensorization of neural networks for improved privacy and interpretabilityCode0
Neural Architecture Codesign for Fast Physics ApplicationsCode0
UPAQ: A Framework for Real-Time and Energy-Efficient 3D Object Detection in Autonomous Vehicles0
CURing Large Models: Compression via CUR Decomposition0
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models0
Strategic Fusion Optimizes Transformer Compression0
Optimizing Small Language Models for In-Vehicle Function-Calling0
DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the EdgeCode0
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants0
Random Conditioning for Diffusion Model Compression with Distillation0
Improving Acoustic Scene Classification in Low-Resource Conditions0
Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models0
Optimization and Scalability of Collaborative Filtering Algorithms in Large Language Models0
HTR-JAND: Handwritten Text Recognition with Joint Attention Network and Knowledge DistillationCode0
Edge-AI for Agriculture: Lightweight Vision Models for Disease Detection in Resource-Limited Settings0
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference0
CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction0
Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights RefinementCode0
Lightweight Design and Optimization methods for DCNNs: Progress and Futures0
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers0
Deploying Foundation Model Powered Agent Services: A Survey0
RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image ClassificationCode0
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs0
Can Students Beyond The Teacher? Distilling Knowledge from Teacher's Bias0
Activation Sparsity Opportunities for Compressing General Large Language Models0
Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices0
Low-Rank Correction for Quantized LLMs0
Lossless Model Compression via Joint Low-Rank Factorization Optimization0
Show:102550
← PrevPage 7 of 28Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MobileBERT + 2bit-1dim model compression using DKMAccuracy82.13Unverified
2MobileBERT + 1bit-1dim model compression using DKMAccuracy63.17Unverified