Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 4240 papers

Title	Date	Tasks	Status	Hype	Score
Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction	Sep 1, 2021	Data PoisoningKnowledge Distillation	CodeCode Available	1	5
Black-box Few-shot Knowledge Distillation	Jul 25, 2022	image-classificationImage Classification	CodeCode Available	1	5
AIM 2024 Challenge on UHD Blind Photo Quality Assessment	Sep 24, 2024	4kComputational Efficiency	CodeCode Available	1	5
Adjoined Networks: A Training Paradigm with Applications to Network Compression	Jun 10, 2020	Knowledge DistillationNeural Architecture Search	CodeCode Available	1	5
Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation	Jun 1, 2020	Knowledge DistillationNeural Architecture Search	CodeCode Available	1	5
Aligned Structured Sparsity Learning for Efficient Image Super-Resolution	Dec 1, 2021	Image Super-ResolutionKnowledge Distillation	CodeCode Available	1	5
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?	Feb 17, 2025	Knowledge DistillationLanguage Modeling	CodeCode Available	1	5
Boosting Multi-Label Image Classification with Complementary Parallel Self-Distillation	May 23, 2022	image-classificationImage Classification	CodeCode Available	1	5
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings	Oct 23, 2022	Acoustic Unit DiscoveryContrastive Learning	CodeCode Available	1	5
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer	May 21, 2025	DenoisingKnowledge Distillation	CodeCode Available	1	5
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model	Dec 2, 2024	cross-modal alignmentKnowledge Distillation	CodeCode Available	1	5
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement	Jan 1, 2025	cross-modal alignmentKnowledge Distillation	CodeCode Available	1	5
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation	Oct 15, 2024	Knowledge DistillationRgb-T Tracking	CodeCode Available	1	5
Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection	Jul 16, 2024	Knowledge Distillationobject-detection	CodeCode Available	1	5
DARTS: Double Attention Reference-based Transformer for Super-resolution	Jul 17, 2023	Image Super-ResolutionKnowledge Distillation	CodeCode Available	1	5
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1	5
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval	Oct 7, 2022	Knowledge DistillationRetrieval	CodeCode Available	1	5
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer	Nov 14, 2022	Image ClassificationKnowledge Distillation	CodeCode Available	1	5
AlphaFold Distillation for Protein Design	Oct 5, 2022	DiversityDrug Discovery	CodeCode Available	1	5
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation	Oct 11, 2023	Decoderfr-en	CodeCode Available	1	5
CaMEL: Mean Teacher Learning for Image Captioning	Feb 21, 2022	Image CaptioningKnowledge Distillation	CodeCode Available	1	5
AltDiffusion: A Multilingual Text-to-Image Diffusion Model	Aug 19, 2023	BlockingConcept Alignment	CodeCode Available	1	5
Better Estimation of the KL Divergence Between Language Models	Apr 14, 2025	Knowledge Distillation	CodeCode Available	1	5
CEN-HDR: Computationally Efficient neural Network for real-time High Dynamic Range imaging	Feb 10, 2023	Efficient Neural NetworkKnowledge Distillation	CodeCode Available	1	5
DialoKG: Knowledge-Structure Aware Task-Oriented Dialogue Generation	Apr 19, 2022	Dialogue GenerationKnowledge Distillation	CodeCode Available	1	5
Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels	Mar 28, 2023	Knowledge Distillation	CodeCode Available	1	5
CCL: Continual Contrastive Learning for LiDAR Place Recognition	Mar 24, 2023	Autonomous DrivingContinual Learning	CodeCode Available	1	5
Categorical Relation-Preserving Contrastive Knowledge Distillation for Medical Image Classification	Jul 7, 2021	Classificationimage-classification	CodeCode Available	1	5
AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection	May 21, 2024	Knowledge DistillationPedestrian Detection	CodeCode Available	1	5
Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning	Jun 11, 2023	Knowledge DistillationMeta-Learning	CodeCode Available	1	5
Adaptive Multi-Teacher Multi-level Knowledge Distillation	Mar 6, 2021	Knowledge Distillation	CodeCode Available	1	5
Channel Distillation: Channel-Wise Attention for Knowledge Distillation	Jun 2, 2020	Knowledge Distillation	CodeCode Available	1	5
Understanding the Role of the Projector in Knowledge Distillation	Mar 20, 2023	image-classificationImage Classification	CodeCode Available	1	5
Channel-Aware Distillation Transformer for Depth Estimation on Nano Drones	Mar 18, 2023	Autonomous NavigationDepth Estimation	CodeCode Available	1	5
Channel Gating Neural Networks	May 29, 2018	Knowledge DistillationNetwork Pruning	CodeCode Available	1	5
Channel-wise Knowledge Distillation for Dense Prediction	Nov 26, 2020	Knowledge DistillationPrediction	CodeCode Available	1	5
CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation	Feb 21, 2021	Image SegmentationKnowledge Distillation	CodeCode Available	1	5
Directed Acyclic Transformer for Non-Autoregressive Machine Translation	May 16, 2022	Knowledge DistillationMachine Translation	CodeCode Available	1	5
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing	Feb 7, 2020	Knowledge DistillationModel Compression	CodeCode Available	1	5
Class Attention Transfer Based Knowledge Distillation	Apr 25, 2023	Knowledge DistillationModel Compression	CodeCode Available	1	5
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	Oct 2, 2019	Hate Speech DetectionKnowledge Distillation	CodeCode Available	1	5
Class-Balanced Distillation for Long-Tailed Visual Recognition	Apr 12, 2021	Image ClassificationKnowledge Distillation	CodeCode Available	1	5
Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning	Aug 18, 2023	class-incremental learningClass Incremental Learning	CodeCode Available	1	5
Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation	Apr 2, 2022	class-incremental learningClass Incremental Learning	CodeCode Available	1	5
Curriculum Temperature for Knowledge Distillation	Nov 29, 2022	Image ClassificationKnowledge Distillation	CodeCode Available	1	5
Distillation-Based Training for Multi-Exit Architectures	Oct 1, 2019	Knowledge Distillation	CodeCode Available	1	5
CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data	Dec 14, 2023	Contrastive LearningFederated Learning	CodeCode Available	1	5
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers	Apr 9, 2024	Knowledge DistillationZero-shot Generalization	CodeCode Available	1	5
CLIP-KD: An Empirical Study of CLIP Model Distillation	Jul 24, 2023	Contrastive LearningCross-Modal Retrieval	CodeCode Available	1	5
AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation	Aug 8, 2023	Knowledge DistillationSemantic Segmentation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 5 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified