Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2800 of 4240 papers

Title	Date	Tasks	Status
Stealing Neural Networks via Timing Side Channels	Dec 31, 2018	Knowledge DistillationReinforcement Learning	—Unverified
Step Out and Seek Around: On Warm-Start Training with Incremental Data	Jun 6, 2024	Autonomous DrivingKnowledge Distillation	—Unverified
Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction	May 20, 2024	Autonomous DrivingKnowledge Distillation	—Unverified
Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency	Jan 22, 2024	Depth EstimationKnowledge Distillation	—Unverified
STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft	Jun 17, 2024	Knowledge DistillationLanguage Modeling	—Unverified
Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation	Sep 29, 2021	Knowledge Distillation	—Unverified
Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks	Sep 30, 2020	image-classificationImage Classification	—Unverified
Strategic Fusion Optimizes Transformer Compression	Jan 5, 2025	Knowledge DistillationModel Compression	—Unverified
Streaming egocentric action anticipation: An evaluation scheme and approach	Jun 29, 2023	Action AnticipationKnowledge Distillation	—Unverified
Streaming Transformer ASR with Blockwise Synchronous Inference	Jun 25, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation	May 6, 2023	Knowledge DistillationQuantization	—Unverified
Structural Knowledge Distillation for Object Detection	Nov 23, 2022	Feature ImportanceKnowledge Distillation	—Unverified
Structural Teacher-Student Normality Learning for Multi-Class Anomaly Detection and Localization	Feb 27, 2024	Anomaly DetectionKnowledge Distillation	—Unverified
Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems	May 2, 2023	Incremental LearningKnowledge Distillation	—Unverified
Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation	Oct 9, 2024	Depth EstimationKnowledge Distillation	—Unverified
Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection	Nov 14, 2022	Knowledge Distillation	—Unverified
Structured Pruning of Neural Networks with Budget-Aware Regularization	Nov 23, 2018	Knowledge Distillation	—Unverified
StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition	Dec 2, 2022	Image RetrievalKnowledge Distillation	—Unverified
Student as an Inherent Denoiser of Noisy Teacher	Dec 15, 2023	Knowledge DistillationLanguage Modeling	—Unverified
Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher	Jan 1, 2021	image-classificationImage Classification	—Unverified
Student-friendly Knowledge Distillation	May 18, 2023	Knowledge Distillation	—Unverified
Student Network Learning via Evolutionary Knowledge Distillation	Mar 23, 2021	Knowledge DistillationTransfer Learning	—Unverified
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation	Sep 27, 2024	Knowledge DistillationTransfer Learning	—Unverified
Students Parrot Their Teachers: Membership Inference on Model Distillation	Mar 6, 2023	Knowledge Distillation	—Unverified
Students taught by multimodal teachers are superior action recognizers	Oct 9, 2022	Action RecognitionKnowledge Distillation	—Unverified
Students Who Study Together Learn Better: On the Importance of Collective Knowledge Distillation for Domain Transfer in Fact Verification	Nov 1, 2021	Fact VerificationKnowledge Distillation	—Unverified
Study of Encoder-Decoder Architectures for Code-Mix Search Query Translation	Aug 7, 2022	Data AugmentationDecoder	—Unverified
Style over Substance: Distilled Language Models Reason Via Stylistic Replication	Apr 2, 2025	Knowledge Distillation	—Unverified
Sub-Band Knowledge Distillation Framework for Speech Enhancement	May 29, 2020	Knowledge DistillationSpeech Enhancement	—Unverified
Subclass Knowledge Distillation with Known Subclass Labels	Jul 17, 2022	Binary ClassificationKnowledge Distillation	—Unverified
Sub-Graph Learning for Spatiotemporal Forecasting via Knowledge Distillation	Nov 17, 2022	DiversityGraph Learning	—Unverified
SUGAR: Pre-training 3D Visual Representations for Robotics	Apr 1, 2024	3D Instance Segmentation3D Object Recognition	—Unverified
Supervised Graph Contrastive Pretraining for Text Classification	Dec 21, 2021	ClassificationContrastive Learning	—Unverified
Supervision Complexity and its Role in Knowledge Distillation	Jan 28, 2023	image-classificationImage Classification	—Unverified
Supporting Cross-language Cross-project Bug Localization Using Pre-trained Language Models	Jul 3, 2024	Contrastive LearningCPU	—Unverified
Knowledge Distillation in Federated Edge Learning: A Survey	Jan 14, 2023	Knowledge DistillationSurvey	—Unverified
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application	Jul 2, 2024	Knowledge DistillationSurvey	—Unverified
Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework	Dec 16, 2022	Knowledge DistillationModel Compression	—Unverified
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models	Oct 25, 2024	Instruction FollowingKnowledge Distillation	—Unverified
Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-task Learning	Mar 24, 2020	ClassificationGeneral Classification	—Unverified
Synergistic Effects of Knowledge Distillation and Structured Pruning for Self-Supervised Speech Models	Feb 9, 2025	Knowledge DistillationModel Compression	—Unverified
Syntactic Structure Distillation Pretraining For Bidirectional Encoders	May 27, 2020	Knowledge DistillationLanguage Modeling	—Unverified
Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks	Jul 22, 2024	Knowledge Distillation	—Unverified
Synthetic Unknown Class Learning for Learning Unknowns	Nov 15, 2021	DiversityKnowledge Distillation	—Unverified
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models	Jan 28, 2025	Knowledge DistillationModel Compression	—Unverified
Tailored Federated Learning: Leveraging Direction Regulation & Knowledge Distillation	Sep 29, 2024	Federated LearningKnowledge Distillation	—Unverified
Take a Prior from Other Tasks for Severe Blur Removal	Feb 14, 2023	DeblurringImage Deblurring	—Unverified
TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	Jun 3, 2025	DecoderKnowledge Distillation	—Unverified
Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication	Oct 4, 2023	DecoderKnowledge Distillation	—Unverified
Target-driven Self-Distillation for Partial Observed Trajectories Forecasting	Jan 28, 2025	Autonomous DrivingKnowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 56 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified