Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2801–2850 of 4240 papers

Title	Date	Tasks	Status
Targeted Forgetting of Image Subgroups in CLIP Models	Jan 1, 2025	Knowledge DistillationUnsupervised Pre-training	—Unverified
TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant	Oct 16, 2024	Knowledge DistillationTransfer Learning	—Unverified
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation	Mar 25, 2023	Continual LearningKnowledge Distillation	—Unverified
Task-Balanced Distillation for Object Detection	Aug 5, 2022	ClassificationKnowledge Distillation	—Unverified
TASKED: Transformer-based Adversarial learning for human activity recognition using wearable sensors via Self-KnowledgE Distillation	Sep 14, 2022	Activity RecognitionHuman Activity Recognition	—Unverified
Task Integration Distillation for Object Detectors	Apr 2, 2024	Knowledge DistillationObject	—Unverified
Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation	Mar 10, 2025	Image SegmentationKnowledge Distillation	—Unverified
Teacher's pet: understanding and mitigating biases in distillation	Jun 19, 2021	image-classificationImage Classification	—Unverified
Teacher-Student Architecture for Knowledge Learning: A Survey	Oct 28, 2022	Knowledge DistillationMulti-Task Learning	—Unverified
Teacher-Student Architecture for Knowledge Distillation: A Survey	Aug 8, 2023	Knowledge Distillationregression	—Unverified
Teacher-Student chain for efficient semi-supervised histology image classification	Mar 17, 2020	ClassificationGeneral Classification	—Unverified
Teacher-Student Knowledge Distillation for Radar Perception on Embedded Accelerators	Mar 14, 2023	Knowledge Distillationobject-detection	—Unverified
Distilled Siamese Networks for Visual Tracking	Jul 24, 2019	Knowledge DistillationObject Tracking	—Unverified
Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion	Aug 3, 2020	Facial Expression RecognitionFacial Expression Recognition (FER)	—Unverified
Teacher-Student Training and Triplet Loss to Reduce the Effect of Drastic Face Occlusion	Nov 20, 2021	Age EstimationFacial Expression Recognition	—Unverified
Teacher-Student Training for Robust Tacotron-based TTS	Nov 7, 2019	DecoderKnowledge Distillation	—Unverified
Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios	Jun 8, 2024	Knowledge Distillation	—Unverified
"Teaching Independent Parts Separately" (TIPSy-GAN) : Improving Accuracy and Stability in Unsupervised Adversarial 2D to 3D Pose Estimation	May 12, 2022	3D Human Pose Estimation3D Pose Estimation	—Unverified
Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework	Mar 2, 2024	Knowledge Distillation	—Unverified
Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer	Apr 9, 2025	Knowledge Distillationparameter-efficient fine-tuning	—Unverified
Teaching Small Language Models to Reason	Dec 16, 2022	GSM8KKnowledge Distillation	—Unverified
Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection	Jun 11, 2024	Knowledge Distillationobject-detection	—Unverified
Teach me with a Whisper: Enhancing Large Language Models for Analyzing Spoken Transcripts using Speech Embeddings	Nov 13, 2023	Knowledge DistillationLanguage Modeling	—Unverified
Teach model to answer questions after comprehending the document	Jul 18, 2023	Knowledge DistillationMachine Reading Comprehension	—Unverified
Technical Report for ICCV 2021 Challenge SSLAD-Track3B: Transformers Are Better Continual Learners	Jan 13, 2022	Continual LearningKnowledge Distillation	—Unverified
Technical Report of Team GraphMIRAcles in the WikiKG90M-LSC Track of OGB-LSC @ KDD Cup 2021	Jul 12, 2021	Knowledge DistillationKnowledge Graphs	—Unverified
Technical report on Conversational Question Answering	Sep 24, 2019	Conversational Question AnsweringData Augmentation	—Unverified
Temporal Knowledge Distillation for On-device Audio Classification	Oct 27, 2021	Audio ClassificationClassification	—Unverified
Temporal Knowledge Distillation for Time-Sensitive Financial Services Applications	Dec 28, 2023	Anomaly DetectionFraud Detection	—Unverified
Temporal reasoning for timeline summarisation in social media	Dec 30, 2024	Knowledge DistillationTimeline Summarization	—Unverified
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks	Mar 5, 2025	Computational EfficiencyKnowledge Distillation	—Unverified
TenTrans Large-Scale Multilingual Machine Translation System for WMT21	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
TernaryLLM: Ternarized Large Language Model	Jun 11, 2024	Knowledge DistillationLanguage Modeling	—Unverified
Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation	May 8, 2021	DenoisingKnowledge Distillation	—Unverified
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation	Jul 26, 2021	Handwriting RecognitionHTR	—Unverified
The Best of Both Worlds: Accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation	Jan 21, 2023	Federated LearningKnowledge Distillation	—Unverified
The economic trade-offs of large language models: A case study	Jun 8, 2023	Knowledge DistillationPrompt Engineering	—Unverified
The Estimation of Continual Causal Effect for Dataset Shifting Streams	Apr 29, 2025	counterfactualKnowledge Distillation	—Unverified
The Graph's Apprentice: Teaching an LLM Low Level Knowledge for Circuit Quality Estimation	Oct 30, 2024	Knowledge Distillation	—Unverified
The LMU Munich System for the WMT 2021 Large-Scale Multilingual Machine Translation Shared Task	Nov 1, 2021	Data AugmentationKnowledge Distillation	—Unverified
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding	Feb 19, 2020	Knowledge DistillationMulti-Task Learning	—Unverified
The Mininglamp Machine Translation System for WMT21	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
The NiuTrans Machine Translation Systems for WMT19	Aug 1, 2019	Knowledge DistillationMachine Translation	—Unverified
The NiuTrans Machine Translation Systems for WMT21	Sep 22, 2021	Knowledge DistillationMachine Translation	—Unverified
The NiuTrans Machine Translation Systems for WMT20	Nov 1, 2020	Knowledge DistillationMachine Translation	—Unverified
The NiuTrans System for the WMT 2021 Efficiency Task	Nov 1, 2021	GPUKnowledge Distillation	—Unverified
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures	Mar 23, 2021	Information RetrievalKnowledge Distillation	—Unverified
Theoretical Guarantees for LT-TTD: A Unified Transformer-based Architecture for Two-Level Ranking Systems	May 7, 2025	Computational EfficiencyKnowledge Distillation	—Unverified
The Privileged Students: On the Value of Initialization in Multilingual Knowledge Distillation	Jun 24, 2024	Knowledge Distillation	—Unverified
The RoyalFlush System for the WMT 2022 Efficiency Task	Dec 3, 2022	DecoderGPU	—Unverified

Show:10 25 50

← PrevPage 57 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified