Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3051–3100 of 4240 papers

Title	Date	Tasks	Status
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation	Mar 12, 2022	Image-to-Image TranslationKnowledge Distillation	—Unverified
WAVE: Weight Template for Adaptive Initialization of Variable-sized Models	Jun 25, 2024	Knowledge DistillationTransfer Learning	—Unverified
Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation	Nov 1, 2019	ClassificationCross-Lingual Transfer	—Unverified
Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching	May 18, 2021	Caption GenerationCross-Modal Retrieval	—Unverified
Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation	Mar 26, 2021	Domain AdaptationKnowledge Distillation	—Unverified
Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning	Mar 2, 2023	Human-Object Interaction DetectionKnowledge Distillation	—Unverified
Weakly Supervised Monocular 3D Detection with a Single-View Image	Feb 29, 2024	Knowledge DistillationObject Localization	—Unverified
Weakly Supervised Semantic Segmentation via Alternative Self-Dual Teaching	Dec 17, 2021	Knowledge DistillationSemantic Segmentation	—Unverified
Weak-to-Strong Backdoor Attack for Large Language Models	Sep 26, 2024	Backdoor AttackKnowledge Distillation	—Unverified
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation	Dec 15, 2024	Activity Recognitioncross-modal alignment	—Unverified
WebChild 2.0 : Fine-Grained Commonsense Knowledge Distillation	Jul 1, 2017	Knowledge DistillationSemantic Parsing	—Unverified
Web Content Filtering through knowledge distillation of Large Language Models	May 8, 2023	Knowledge Distillation	—Unverified
WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark	May 30, 2024	Knowledge DistillationObject Tracking	—Unverified
WeChat Neural Machine Translation Systems for WMT20	Oct 1, 2020	Knowledge DistillationMachine Translation	—Unverified
WeChat Neural Machine Translation Systems for WMT21	Aug 5, 2021	Knowledge DistillationMachine Translation	—Unverified
WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations	Jul 7, 2021	Knowledge DistillationModel Compression	—Unverified
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition	Oct 27, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Weight Decay Scheduling and Knowledge Distillation for Active Learning	Aug 1, 2020	Active LearningKnowledge Distillation	—Unverified
Weight Distillation: Transferring the Knowledge in Neural Network Parameters	Sep 19, 2020	Knowledge DistillationMachine Translation	—Unverified
Weighted KL-Divergence for Document Ranking Model Refinement	Jun 10, 2024	Contrastive LearningDocument Ranking	—Unverified
Weight Squeezing: Reparameterization for Compression and Fast Inference	May 30, 2020	Knowledge DistillationModel Compression	—Unverified
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding	Oct 16, 2021	Knowledge DistillationModel Compression	—Unverified
What do larger image classifiers memorise?	Oct 9, 2023	image-classificationImage Classification	—Unverified
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models	Apr 6, 2024	Knowledge DistillationLanguage Modeling	—Unverified
What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias	Oct 10, 2024	Age/UnbiasedFairness	—Unverified
What is Lost in Knowledge Distillation?	Nov 7, 2023	Knowledge DistillationModel Compression	—Unverified
What Knowledge Gets Distilled in Knowledge Distillation?	May 31, 2022	Knowledge Distillation	—Unverified
What Makes a Good Dataset for Knowledge Distillation?	Nov 19, 2024	Continual LearningKnowledge Distillation	—Unverified
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation	Nov 16, 2021	Data AugmentationHellaSwag	—Unverified
When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario	May 17, 2023	Knowledge Distillation	—Unverified
Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models	Jan 3, 2022	CPUData Augmentation	—Unverified
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition	May 18, 2023	Knowledge DistillationQuantization	—Unverified
Whole-Slide Mitosis Detection in H&E Breast Histology Using PHH3 as a Reference to Train Distilled Stain-Invariant Convolutional Networks	Aug 17, 2018	Data AugmentationKnowledge Distillation	—Unverified
Why distillation helps: a statistical perspective	May 21, 2020	Knowledge DistillationRetrieval	—Unverified
Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT	Jul 1, 2022	Knowledge Distillation	—Unverified
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation	May 19, 2025	Knowledge DistillationLanguage Modeling	—Unverified
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents	Feb 26, 2025	HallucinationKnowledge Distillation	—Unverified
Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning	Jan 30, 2022	Knowledge DistillationNetwork Pruning	—Unverified
Wired Perspectives: Multi-View Wire Art Embraces Generative AI	Nov 26, 2023	Knowledge Distillation	—Unverified
Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model	Feb 21, 2024	Knowledge Distillationmodel	—Unverified
WK-Pnet: FM-Based Positioning via Wavelet Packet Decomposition and Knowledge Distillation	Apr 10, 2025	Knowledge DistillationPosition	—Unverified
Word Sense Induction with Knowledge Distillation from BERT	Apr 20, 2023	Knowledge DistillationLanguage Modeling	—Unverified
X^3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection	Mar 3, 2023	3D Object DetectionInstance Segmentation	—Unverified
X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection	Jan 1, 2023	3D Object DetectionInstance Segmentation	—Unverified
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs	Feb 27, 2025	Knowledge Distillation	—Unverified
XD: Cross-lingual Knowledge Distillation for Polyglot Sentence Embeddings	Sep 25, 2019	Knowledge DistillationLanguage Modeling	—Unverified
X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation	Oct 24, 2021	Depth EstimationKnowledge Distillation	—Unverified
Xiaomi's Submissions for IWSLT 2020 Open Domain Translation Task	Jul 1, 2020	Domain AdaptationKnowledge Distillation	—Unverified
X Modality Assisting RGBT Object Tracking	Dec 27, 2023	Knowledge DistillationObject	—Unverified
xVLM2Vec: Adapting LVLM-based embedding models to multilinguality using Self-Knowledge Distillation	Mar 12, 2025	Knowledge DistillationLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 62 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified