Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1501–1550 of 4240 papers

Title	Date	Tasks	Status
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model	Jun 12, 2024	Knowledge DistillationSelf-Supervised Learning	—Unverified
Extreme Compression for Pre-trained Transformers Made Simple and Efficient	Jun 4, 2022	Knowledge DistillationQuantization	—Unverified
Compression of Deep Learning Models for Text: A Survey	Aug 12, 2020	Deep LearningInformation Retrieval	—Unverified
Extremely Small BERT Models from Mixed-Vocabulary Training	Sep 25, 2019	Knowledge DistillationLanguage Modelling	—Unverified
Continual Self-Supervised Learning with Masked Autoencoders in Remote Sensing	Jun 26, 2025	Continual LearningContinual Self-Supervised Learning	—Unverified
Face to Cartoon Incremental Super-Resolution using Knowledge Distillation	Jan 27, 2024	HallucinationIncremental Learning	—Unverified
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization	Dec 12, 2022	Knowledge DistillationNatural Language Understanding	—Unverified
AutoADR: Automatic Model Design for Ad Relevance	Oct 14, 2020	AutoMLKnowledge Distillation	—Unverified
Generalized Supervised Contrastive Learning	Jun 1, 2022	Contrastive LearningKnowledge Distillation	—Unverified
Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models	Nov 20, 2018	Knowledge DistillationPerson Re-Identification	—Unverified
Compression of Acoustic Event Detection Models With Quantized Distillation	Jul 1, 2019	Event DetectionKnowledge Distillation	—Unverified
Factual Dialogue Summarization via Learning from Large Language Models	Jun 20, 2024	Contrastive LearningData Augmentation	—Unverified
Selective Cross-Task Distillation	Apr 25, 2022	Knowledge Distillation	—Unverified
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices	Jun 20, 2024	Knowledge DistillationModel Compression	—Unverified
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation	Aug 17, 2021	Knowledge Distillationobject-detection	—Unverified
Fair Feature Distillation for Visual Recognition	May 27, 2021	FairnessKnowledge Distillation	—Unverified
Generalized Continual Zero-Shot Learning	Nov 17, 2020	Continual LearningKnowledge Distillation	—Unverified
Fairly Predicting Graft Failure in Liver Transplant for Organ Assigning	Feb 18, 2023	FairnessKnowledge Distillation	—Unverified
Compressing Visual-linguistic Model via Knowledge Distillation	Apr 5, 2021	Image CaptioningKnowledge Distillation	—Unverified
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models	Jan 16, 2025	Knowledge DistillationMemorization	—Unverified
Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning	Jun 21, 2024	Knowledge Distillation	—Unverified
Faithful Knowledge Distillation	Jun 7, 2023	Adversarial RobustnessKnowledge Distillation	—Unverified
A Theoretical Analysis of Soft-Label vs Hard-Label Training in Neural Networks	Dec 12, 2024	Binary ClassificationKnowledge Distillation	—Unverified
Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models	Jun 21, 2025	Dimensionality ReductionKeyword Spotting	—Unverified
Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation	Feb 23, 2021	Knowledge Distillation	—Unverified
Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices	Aug 24, 2023	Knowledge DistillationTime Series	—Unverified
Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment	Sep 2, 2024	CPUGPU	—Unverified
GAN-Knowledge Distillation for one-stage Object Detection	Jun 20, 2019	Knowledge DistillationObject	—Unverified
Enhancing CTC-Based Visual Speech Recognition	Sep 11, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging	Oct 1, 2024	Computational EfficiencyKnowledge Distillation	—Unverified
Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation	Dec 8, 2024	Image Quality AssessmentKnowledge Distillation	—Unverified
Fast and Efficient Once-For-All Networks for Diverse Hardware Deployment	Sep 29, 2021	AllGPU	—Unverified
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation	Sep 5, 2023	Image CompressionKnowledge Distillation	—Unverified
Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition	Nov 22, 2021	Action RecognitionContrastive Learning	—Unverified
Enhancing Chinese Multi-Label Text Classification Performance with Response-based Knowledge Distillation	Nov 1, 2022	Knowledge DistillationMulti Label Text Classification	—Unverified
A Technical Study into Small Reasoning Language Models	Jun 16, 2025	Code GenerationComputational Efficiency	—Unverified
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher	Oct 5, 2024	Knowledge Distillation	—Unverified
Enhancing Adversarial Training with Prior Knowledge Distillation for Robust Image Compression	Mar 11, 2024	Backdoor AttackImage Compression	—Unverified
Convolutional Neural Network Compression through Generalized Kronecker Product Decomposition	Sep 29, 2021	image-classificationImage Classification	—Unverified
Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold	Dec 22, 2023	Density EstimationImage-to-Image Translation	—Unverified
Faster Inference of Integer SWIN Transformer by Removing the GELU Activation	Feb 2, 2024	GPUimage-classification	—Unverified
Compressing GANs using Knowledge Distillation	Feb 1, 2019	Knowledge DistillationSuper-Resolution	—Unverified
Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation	Apr 28, 2024	Action RecognitionGeneral Knowledge	—Unverified
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation	Apr 2, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models	Dec 13, 2023	Image GenerationKnowledge Distillation	—Unverified
A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone	Oct 16, 2019	Gaze EstimationKnowledge Distillation	—Unverified
FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline	Dec 15, 2023	GPUKnowledge Distillation	—Unverified
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper	Sep 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models	Oct 7, 2020	AllKnowledge Distillation	—Unverified
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization	Jun 29, 2024	Knowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 31 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified