Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1100 of 4240 papers

Title	Date	Tasks	Status	Score
Learning Lightweight Lane Detection CNNs by Self Attention Distillation	Aug 2, 2019	Knowledge DistillationLane Detection	CodeCode Available	5
Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets	May 24, 2024	Knowledge DistillationMulti-Task Learning	CodeCode Available	5
MST-KD: Multiple Specialized Teachers Knowledge Distillation for Fair Face Recognition	Aug 29, 2024	Face RecognitionKnowledge Distillation	CodeCode Available	5
Leaning Compact and Representative Features for Cross-Modality Person Re-Identification	Mar 26, 2021	Cross-Modality Person Re-identificationKnowledge Distillation	CodeCode Available	5
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation	May 17, 2019	Knowledge Distillation	CodeCode Available	5
Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation	Mar 25, 2023	Federated LearningFew-Shot Learning	CodeCode Available	5
Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition	Feb 28, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	5
Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation	Nov 26, 2024	Data-free Knowledge DistillationDiversity	CodeCode Available	5
Beyond the Limitation of Monocular 3D Detector via Knowledge Distillation	Jan 1, 2023	Knowledge Distillation	CodeCode Available	5
DCA: Dividing and Conquering Amnesia in Incremental Object Detection	Mar 19, 2025	Knowledge Distillationobject-detection	CodeCode Available	5
Data Upcycling Knowledge Distillation for Image Super-Resolution	Sep 25, 2023	Image Super-ResolutionKnowledge Distillation	CodeCode Available	5
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks	Oct 3, 2024	Dataset DistillationKnowledge Distillation	CodeCode Available	5
KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer	Feb 22, 2023	Knowledge DistillationTransfer Learning	CodeCode Available	5
Language Model Knowledge Distillation for Efficient Question Answering in Spanish	Dec 7, 2023	Knowledge DistillationLanguage Modeling	CodeCode Available	5
KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server	Oct 8, 2024	Federated LearningKnowledge Distillation	CodeCode Available	5
Adaptive Mixing of Auxiliary Losses in Supervised Learning	Feb 7, 2022	DenoisingKnowledge Distillation	CodeCode Available	5
Data-Free Knowledge Distillation for Image Super-Resolution	Jun 19, 2021	Data-free Knowledge DistillationImage Super-Resolution	CodeCode Available	5
Data-free Knowledge Distillation for Fine-grained Visual Categorization	Apr 18, 2024	Data-free Knowledge DistillationFine-Grained Visual Categorization	CodeCode Available	5
Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation	Apr 3, 2025	Anomaly DetectionKnowledge Distillation	CodeCode Available	5
Knowledge Extraction with No Observable Data	Dec 1, 2019	Data-free Knowledge DistillationKnowledge Distillation	CodeCode Available	5
Knowledge Grafting of Large Language Models	May 24, 2025	Continual LearningKnowledge Distillation	CodeCode Available	5
Knowledge Transfer Graph for Deep Collaborative Learning	Sep 10, 2019	Knowledge DistillationTransfer Learning	CodeCode Available	5
Data-free Knowledge Distillation for Segmentation using Data-Enriching GAN	Nov 2, 2020	Data-free Knowledge DistillationDiversity	CodeCode Available	5
Data-Free Generative Replay for Class-Incremental Learning on Imbalanced Data	Jun 7, 2024	class-incremental learningClass Incremental Learning	CodeCode Available	5
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary	May 15, 2018	Adversarial AttackKnowledge Distillation	CodeCode Available	5
Knowledge Distillation with Reptile Meta-Learning for Pretrained Language Model Compression	Oct 1, 2022	Knowledge DistillationLanguage Modeling	CodeCode Available	5
Knowledge Distillation via Instance Relationship Graph	Jun 1, 2019	Knowledge Distillation	CodeCode Available	5
Data-Free Adversarial Distillation	Dec 23, 2019	Knowledge DistillationModel Compression	CodeCode Available	5
Data exploitation: multi-task learning of object detection and semantic segmentation on partially annotated data	Nov 7, 2023	Knowledge DistillationMulti-Task Learning	CodeCode Available	5
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation	Jun 13, 2022	image-classificationImage Classification	CodeCode Available	5
Better Supervisory Signals by Observing Learning Paths	Mar 4, 2022	Knowledge Distillation	CodeCode Available	5
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation	Mar 3, 2024	Knowledge DistillationMachine Translation	CodeCode Available	5
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary	May 4, 2022	Knowledge Distillation	CodeCode Available	5
Knowledge Distillation Layer that Lets the Student Decide	Sep 6, 2023	Knowledge Distillation	CodeCode Available	5
Knowledge Distillation Performs Partial Variance Reduction	May 27, 2023	Knowledge Distillation	CodeCode Available	5
DASK: Distribution Rehearsing via Adaptive Style Kernel Learning for Exemplar-Free Lifelong Person Re-Identification	Dec 12, 2024	Exemplar-FreeKnowledge Distillation	CodeCode Available	5
Knowledge Distillation from Single to Multi Labels: an Empirical Study	Mar 15, 2023	Classificationimage-classification	CodeCode Available	5
Knowledge Distillation from Cross Teaching Teachers for Efficient Semi-Supervised Abdominal Organ Segmentation in CT	Nov 11, 2022	Image SegmentationKnowledge Distillation	CodeCode Available	5
Few Sample Knowledge Distillation for Efficient Network Compression	Dec 5, 2018	Knowledge DistillationNetwork Pruning	CodeCode Available	5
Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance	Dec 19, 2024	Knowledge DistillationStudent dropout	CodeCode Available	5
DAD++: Improved Data-free Test Time Adversarial Defense	Sep 10, 2023	Adversarial DefenseAdversarial Robustness	CodeCode Available	5
DAdEE: Unsupervised Domain Adaptation in Early Exit PLMs	Oct 6, 2024	Domain AdaptationKnowledge Distillation	CodeCode Available	5
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers	Aug 12, 2022	image-classificationImage Classification	CodeCode Available	5
Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework	Jun 6, 2025	Instruction FollowingKnowledge Distillation	CodeCode Available	5
Knowledge Distillation for Quality Estimation	Jul 1, 2021	Data AugmentationKnowledge Distillation	CodeCode Available	5
Knowledge Distillation for Singing Voice Detection	Nov 9, 2020	Information RetrievalKnowledge Distillation	CodeCode Available	5
Aligning (Medical) LLMs for (Counterfactual) Fairness	Aug 22, 2024	counterfactualFairness	CodeCode Available	5
D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization	May 22, 2023	Knowledge Distillation	CodeCode Available	5
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation	Jun 7, 2022	Knowledge DistillationQuestion Answering	CodeCode Available	5
BEBERT: Efficient and Robust Binary Ensemble BERT	Oct 28, 2022	BinarizationComputational Efficiency	CodeCode Available	5

Show:10 25 50

← PrevPage 22 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified