Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 4240 papers

Title	Date	Tasks	Status	Hype
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?	Feb 17, 2025	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Distilling a Powerful Student Model via Online Knowledge Distillation	Mar 26, 2021	Knowledge Distillation	CodeCode Available	1
Lightweight Transformers for Clinical Natural Language Processing	Feb 9, 2023	Continual LearningKnowledge Distillation	CodeCode Available	1
A Discrepancy Aware Framework for Robust Anomaly Detection	Oct 11, 2023	Anomaly DetectionDecoder	CodeCode Available	1
Distilled Semantics for Comprehensive Scene Understanding from Videos	Mar 31, 2020	Depth EstimationKnowledge Distillation	CodeCode Available	1
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning	Apr 22, 2021	Audio Taggingaudio-visual learning	CodeCode Available	1
Camera clustering for scalable stream-based active distillation	Apr 16, 2024	ClusteringKnowledge Distillation	CodeCode Available	1
Contrastive Deep Supervision	Jul 12, 2022	Contrastive LearningFine-Grained Image Classification	CodeCode Available	1
Anomaly Detection in Video via Self-Supervised and Multi-Task Learning	Nov 15, 2020	Abnormal Event Detection In VideoAnomaly Detection	CodeCode Available	1
Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation	Oct 12, 2022	Class-Incremental Semantic SegmentationKnowledge Distillation	CodeCode Available	1
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation	Sep 26, 2023	3D Object DetectionAutonomous Driving	CodeCode Available	1
Contrastive Model Inversion for Data-Free Knowledge Distillation	May 18, 2021	Contrastive LearningData-free Knowledge Distillation	CodeCode Available	1
Contrastive Representation Distillation	Oct 23, 2019	Contrastive LearningKnowledge Distillation	CodeCode Available	1
AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks	Jun 15, 2020	AutoMLKnowledge Distillation	CodeCode Available	1
Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed	Dec 19, 2023	Knowledge Distillation	CodeCode Available	1
LQER: Low-Rank Quantization Error Reconstruction for LLMs	Feb 4, 2024	Knowledge DistillationQuantization	CodeCode Available	1
Distilling Object Detectors via Decoupled Features	Mar 26, 2021	image-classificationImage Classification	CodeCode Available	1
Deep Graph-level Anomaly Detection by Glocal Knowledge Distillation	Dec 19, 2021	Anomaly DetectionKnowledge Distillation	CodeCode Available	1
DialoKG: Knowledge-Structure Aware Task-Oriented Dialogue Generation	Apr 19, 2022	Dialogue GenerationKnowledge Distillation	CodeCode Available	1
Mask-invariant Face Recognition through Template-level Knowledge Distillation	Dec 10, 2021	Face RecognitionKnowledge Distillation	CodeCode Available	1
mCLIP: Multilingual CLIP via Cross-lingual Transfer	Jul 10, 2023	Contrastive LearningCross-Lingual Transfer	CodeCode Available	1
MDFlow: Unsupervised Optical Flow Learning by Reliable Mutual Knowledge Distillation	Nov 11, 2022	BlockingData Augmentation	CodeCode Available	1
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks	Sep 17, 2020	Image ClassificationKnowledge Distillation	CodeCode Available	1
ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning	Oct 11, 2022	Cross-Domain Few-Shotcross-domain few-shot learning	CodeCode Available	1
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking	Apr 4, 2025	Document RankingInformation Retrieval	CodeCode Available	1
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts	Oct 8, 2022	Domain GeneralizationKnowledge Distillation	CodeCode Available	1
Meta-Learning based Degradation Representation for Blind Super-Resolution	Jul 28, 2022	Blind Super-ResolutionKnowledge Distillation	CodeCode Available	1
BERT Learns to Teach: Knowledge Distillation with Meta Learning	Jun 8, 2021	Knowledge DistillationMeta-Learning	CodeCode Available	1
CaMEL: Mean Teacher Learning for Image Captioning	Feb 21, 2022	Image CaptioningKnowledge Distillation	CodeCode Available	1
MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement	Jul 24, 2023	Knowledge DistillationSpeech Enhancement	CodeCode Available	1
Distillation-Based Training for Multi-Exit Architectures	Oct 1, 2019	Knowledge Distillation	CodeCode Available	1
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection	Jan 1, 2024	3D Object DetectionKnowledge Distillation	CodeCode Available	1
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	Oct 2, 2019	Hate Speech DetectionKnowledge Distillation	CodeCode Available	1
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer	May 21, 2025	DenoisingKnowledge Distillation	CodeCode Available	1
MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition	Aug 11, 2022	Data Augmentationimage-classification	CodeCode Available	1
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models	Feb 4, 2021	AttributeBIG-bench Machine Learning	CodeCode Available	1
Distilling Knowledge from Graph Convolutional Networks	Mar 23, 2020	Knowledge DistillationTransfer Learning	CodeCode Available	1
MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation	Sep 2, 2024	Computational EfficiencyImage Quality Assessment	CodeCode Available	1
A Knowledge Distillation Framework For Enhancing Ear-EEG Based Sleep Staging With Scalp-EEG Data	Oct 27, 2022	Domain AdaptationEEG	CodeCode Available	1
Modality-Balanced Learning for Multimedia Recommendation	Jul 26, 2024	Collaborative Filteringcounterfactual	CodeCode Available	1
MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient	Oct 17, 2023	3D Object DetectionGPU	CodeCode Available	1
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection	Apr 7, 2024	3D Object DetectionAutonomous Driving	CodeCode Available	1
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data	Oct 27, 2021	Knowledge DistillationSemantic Segmentation	CodeCode Available	1
MPCFormer: fast, performant and private Transformer inference with MPC	Nov 2, 2022	Knowledge Distillation	CodeCode Available	1
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings	Dec 10, 2021	Contrastive LearningKnowledge Distillation	CodeCode Available	1
Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing	Apr 1, 2020	Knowledge DistillationRetrieval	CodeCode Available	1
Multi-Label Knowledge Distillation	Aug 12, 2023	Binary ClassificationKnowledge Distillation	CodeCode Available	1
Multi-Level Branched Regularization for Federated Learning	Jul 14, 2022	Federated LearningKnowledge Distillation	CodeCode Available	1
Multimodal and multiview distillation for real-time player detection on a football field	Apr 16, 2020	Data AugmentationKnowledge Distillation	CodeCode Available	1
DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining	May 20, 2023	Extractive SummarizationKnowledge Distillation	CodeCode Available	1

Show:10 25 50

← PrevPage 16 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified