Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1100 of 4240 papers

Title	Date	Tasks	Status
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training	Jul 8, 2024	AllGPU	—Unverified
Boosting Graph Neural Networks via Adaptive Knowledge Distillation	Oct 12, 2022	Graph ClassificationGraph Mining	—Unverified
Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches	Aug 23, 2021	CPUData Augmentation	—Unverified
Analyzing Knowledge Distillation in Neural Machine Translation	Oct 1, 2018	Knowledge DistillationMachine Translation	—Unverified
Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs	Mar 21, 2025	intent-classificationIntent Classification	—Unverified
Densely Distilling Cumulative Knowledge for Continual Learning	May 16, 2024	AllContinual Learning	—Unverified
Boosting Contrastive Learning with Relation Knowledge Distillation	Dec 8, 2021	Contrastive LearningKnowledge Distillation	—Unverified
BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks	Sep 13, 2020	Ensemble LearningKnowledge Distillation	—Unverified
Denoising Mutual Knowledge Distillation in Bi-Directional Multiple Instance Learning	May 17, 2025	Denoisingimage-classification	—Unverified
Analyzing Compression Techniques for Computer Vision	May 14, 2023	Knowledge DistillationQuantization	—Unverified
Efficient Image Compression Using Advanced State Space Models	Sep 4, 2024	Computational EfficiencyImage Compression	—Unverified
Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector	Feb 8, 2025	Incremental LearningKnowledge Distillation	—Unverified
Delving Deep into Semantic Relation Distillation	Mar 27, 2025	Knowledge DistillationModel Compression	—Unverified
Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation	Jan 1, 2023	Adversarial RobustnessKnowledge Distillation	—Unverified
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation	Feb 6, 2025	In-Context LearningKnowledge Distillation	—Unverified
An Active Learning Framework for Inclusive Generation by Large Language Models	Oct 17, 2024	Active LearningClustering	—Unverified
Adaptive Regularization of Labels	Aug 15, 2019	Data AugmentationKnowledge Distillation	—Unverified
Efficient Inference via Universal LSH Kernel	Jun 21, 2021	Knowledge DistillationQuantization	—Unverified
Efficient Knowledge Distillation of SAM for Medical Image Segmentation	Jan 28, 2025	Computational EfficiencyDecoder	—Unverified
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier	Dec 27, 2019	Data-free Knowledge DistillationIncremental Learning	—Unverified
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation	May 29, 2024	Instruction FollowingKnowledge Distillation	—Unverified
AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference	May 13, 2023	Knowledge Distillation	—Unverified
Defending against Data-Free Model Extraction by Distributionally Robust Defensive Training	Sep 21, 2023	Knowledge DistillationModel extraction	—Unverified
Defending against Data-Free Model Extraction by Distributionally Robust Defensive Training	Sep 21, 2023	Knowledge DistillationModel extraction	—Unverified
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models	Jul 14, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Deep-to-bottom Weights Decay: A Systemic Knowledge Review Learning Technique for Transformer Layers in Knowledge Distillation	Nov 16, 2021	Knowledge Distillation	—Unverified
Block-wise Intermediate Representation Training for Model Compression	Oct 20, 2018	Knowledge Distillationmodel	—Unverified
Efficient Gravitational Wave Parameter Estimation via Knowledge Distillation: A ResNet1D-IAF Approach	Dec 11, 2024	AstronomyComputational Efficiency	—Unverified
Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection	Nov 17, 2020	Knowledge Distillationvalid	—Unverified
Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors	May 16, 2025	Knowledge DistillationMulti-agent Reinforcement Learning	—Unverified
Deep Semi-Supervised and Self-Supervised Learning for Diabetic Retinopathy Detection	Aug 4, 2022	Diabetic Retinopathy DetectionKnowledge Distillation	—Unverified
Amortized Noisy Channel Neural Machine Translation	Dec 16, 2021	Imitation LearningKnowledge Distillation	—Unverified
Efficient Federated Learning for AIoT Applications Using Knowledge Distillation	Nov 29, 2021	Federated LearningKnowledge Distillation	—Unverified
Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review	Oct 6, 2020	ArticlesDeep Learning	—Unverified
Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation	May 13, 2023	Domain AdaptationKnowledge Distillation	—Unverified
Deep Neural Network Models Compression	Mar 4, 2021	Knowledge DistillationQuantization	—Unverified
Deep Neural Compression Via Concurrent Pruning and Self-Distillation	Sep 30, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation	Jun 12, 2019	Knowledge Distillation	—Unverified
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning	Apr 15, 2025	Knowledge DistillationLanguage Modeling	—Unverified
Efficient Knowledge Distillation via Curriculum Extraction	Mar 21, 2025	Knowledge DistillationLanguage Modeling	—Unverified
Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression	Jan 15, 2018	Knowledge Distillation	—Unverified
Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack	May 3, 2021	Knowledge DistillationSelf-Knowledge Distillation	—Unverified
Deep Learning for Medical Text Processing: BERT Model Fine-Tuning and Comparative Study	Oct 28, 2024	Knowledge Distillation	—Unverified
Knowledge Distillation-aided End-to-End Learning for Linear Precoding in Multiuser MIMO Downlink Systems with Finite-Rate Feedback	Aug 10, 2020	BinarizationKnowledge Distillation	—Unverified
Deep Epidemiological Modeling by Black-box Knowledge Distillation: An Accurate Deep Learning Model for COVID-19	Jan 20, 2021	DiversityKnowledge Distillation	—Unverified
BJTU-WeChat's Systems for the WMT22 Chat Translation Task	Nov 28, 2022	DenoisingKnowledge Distillation	—Unverified
AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation	Aug 1, 2020	Knowledge DistillationTransfer Learning	—Unverified
Efficient Compression of Multitask Multilingual Speech Models	May 2, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications	Feb 20, 2025	Knowledge DistillationModel Compression	—Unverified
Deep Collective Knowledge Distillation	Apr 18, 2023	Knowledge DistillationModel Compression	—Unverified

Show:10 25 50

← PrevPage 22 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified