Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1326–1350 of 4240 papers

Title	Date	Tasks	Status	Score
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images	Apr 22, 2024	class-incremental learningClass Incremental Learning	CodeCode Available	5
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering	Jul 20, 2023	ClusteringData Augmentation	CodeCode Available	5
Induced Model Matching: Restricted Models Help Train Full-Featured Models	Jan 15, 2025	Knowledge DistillationLanguage Modeling	CodeCode Available	5
InDistill: Information flow-preserving knowledge distillation for model compression	May 20, 2022	Knowledge DistillationModel Compression	CodeCode Available	5
Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance	Nov 23, 2024	Computational EfficiencyKnowledge Distillation	CodeCode Available	5
Distilling Knowledge by Mimicking Features	Nov 3, 2020	Knowledge Distillationobject-detection	CodeCode Available	5
Induced Model Matching: How Restricted Models Can Help Larger Ones	Feb 19, 2024	Knowledge DistillationLanguage Modeling	CodeCode Available	5
Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning	Dec 27, 2023	Continual Learninggraph construction	CodeCode Available	5
Knowledge Extraction with No Observable Data	Dec 1, 2019	Data-free Knowledge DistillationKnowledge Distillation	CodeCode Available	5
Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition	Nov 9, 2021	Continual LearningKnowledge Distillation	CodeCode Available	5
PruMUX: Augmenting Data Multiplexing with Model Compression	May 24, 2023	Knowledge Distillationmodel	CodeCode Available	5
Dynamic Rectification Knowledge Distillation	Jan 27, 2022	Edge-computingKnowledge Distillation	CodeCode Available	5
Incorporating Graph Information in Transformer-based AMR Parsing	Jun 23, 2023	Abstract Meaning RepresentationAMR Parsing	CodeCode Available	5
UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation	Dec 21, 2022	3D ReconstructionIncremental Learning	CodeCode Available	5
Improving Question Answering Performance Using Knowledge Distillation and Active Learning	Sep 26, 2021	Active LearningKnowledge Distillation	CodeCode Available	5
Improving Neural Topic Models with Wasserstein Knowledge Distillation	Mar 27, 2023	Knowledge DistillationTopic Models	CodeCode Available	5
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles	May 28, 2025	Knowledge DistillationSound Classification	CodeCode Available	5
Closest Neighbors are Harmful for Lightweight Masked Auto-encoders	Jan 1, 2025	Knowledge Distillation	CodeCode Available	5
Improving Neural Architecture Search Image Classifiers via Ensemble Learning	Mar 14, 2019	Ensemble LearningImage Classification	CodeCode Available	5
3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection	Jul 12, 2024	Knowledge DistillationSocial Media Mental Health Detection	CodeCode Available	5
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition	Jul 16, 2025	BenchmarkingKnowledge Distillation	CodeCode Available	5
Improving Stance Detection with Multi-Dataset Learning and Knowledge Distillation	Nov 1, 2021	Knowledge DistillationStance Detection	CodeCode Available	5
Improving generalizability of distilled self-supervised speech processing models under distorted settings	Oct 14, 2022	Knowledge Distillation	CodeCode Available	5
Improving Robustness by Enhancing Weak Subnets	Jan 30, 2022	Adversarial RobustnessData Augmentation	CodeCode Available	5
Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts	Jul 17, 2023	automatic-speech-translationImitation Learning	CodeCode Available	5

Show:10 25 50

← PrevPage 54 of 170Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified