Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3951–4000 of 4240 papers

Title	Date	Tasks	Status
Knowledge Distillation for Mobile Edge Computation Offloading	Apr 9, 2020	Imitation LearningKnowledge Distillation	—Unverified
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression	Apr 8, 2020	BlockingKnowledge Distillation	—Unverified
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation	Apr 7, 2020	Knowledge DistillationSentence	—Unverified
Enhancing Review Comprehension with Domain-Specific Commonsense	Apr 6, 2020	Aspect ExtractionKnowledge Distillation	—Unverified
Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge	Apr 1, 2020	3D Hand Pose EstimationHand Pose Estimation	—Unverified
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation	Mar 31, 2020	Knowledge DistillationObject	—Unverified
SS-IL: Separated Softmax for Incremental Learning	Mar 31, 2020	class-incremental learningClass Incremental Learning	—Unverified
Analysis of Knowledge Transfer in Kernel Regime	Mar 30, 2020	Knowledge DistillationTransfer Learning	—Unverified
Squeezed Deep 6DoF Object Detection Using Knowledge Distillation	Mar 30, 2020	Knowledge DistillationObject	CodeCode Available
Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-task Learning	Mar 24, 2020	ClassificationGeneral Classification	—Unverified
A Survey of Methods for Low-Power Deep Learning and Computer Vision	Mar 24, 2020	Knowledge DistillationQuantization	—Unverified
Teacher-Student chain for efficient semi-supervised histology image classification	Mar 17, 2020	ClassificationGeneral Classification	—Unverified
Knowledge distillation via adaptive instance normalization	Mar 9, 2020	Knowledge DistillationModel Compression	—Unverified
Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network	Mar 9, 2020	Knowledge DistillationModel Compression	—Unverified
Explaining Knowledge Distillation by Quantifying the Knowledge	Mar 7, 2020	Knowledge Distillation	—Unverified
Distilling portable Generative Adversarial Networks for Image Translation	Mar 7, 2020	Image-to-Image TranslationKnowledge Distillation	—Unverified
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation	Mar 5, 2020	Domain AdaptationKnowledge Distillation	—Unverified
An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation	Feb 28, 2020	Knowledge DistillationMemorization	—Unverified
Residual Knowledge Distillation	Feb 21, 2020	Knowledge DistillationModel Compression	—Unverified
Balancing Cost and Benefit with Tied-Multi Transformers	Feb 20, 2020	DecoderKnowledge Distillation	—Unverified
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding	Feb 19, 2020	Knowledge DistillationMulti-Task Learning	—Unverified
Self-Distillation Amplifies Regularization in Hilbert Space	Feb 13, 2020	Knowledge DistillationL2 Regularization	—Unverified
Content Based Singing Voice Extraction From a Musical Mixture	Feb 12, 2020	DecoderDeep Learning	CodeCode Available
Meta-Learning across Meta-Tasks for Few-Shot Learning	Feb 11, 2020	Domain AdaptationFew-Shot Learning	—Unverified
Regularized Evolutionary Population-Based Training	Feb 11, 2020	Diversityimage-classification	—Unverified
Understanding and Improving Knowledge Distillation	Feb 10, 2020	Knowledge DistillationModel Compression	—Unverified
Unlabeled Data Deployment for Classification of Diabetic Retinopathy Images Using Knowledge Transfer	Feb 9, 2020	General ClassificationKnowledge Distillation	—Unverified
Feature-map-level Online Adversarial Knowledge Distillation	Feb 5, 2020	Knowledge Distillation	—Unverified
Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning	Feb 1, 2020	Knowledge DistillationMuJoCo	CodeCode Available
Search for Better Students to Learn Distilled Knowledge	Jan 30, 2020	Knowledge DistillationModel Compression	—Unverified
MSE-Optimal Neural Network Initialization via Layer Fusion	Jan 28, 2020	General ClassificationKnowledge Distillation	CodeCode Available
Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning	Jan 27, 2020	Deep Reinforcement LearningKnowledge Distillation	—Unverified
Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings	Jan 25, 2020	General ClassificationKnowledge Distillation	—Unverified
Data Techniques For Online End-to-end Speech Recognition	Jan 24, 2020	Data AugmentationDomain Adaptation	—Unverified
Lightweight 3D Human Pose Estimation Network Training Using Teacher-Student Learning	Jan 15, 2020	3D Human Pose Estimation3D Pose Estimation	—Unverified
A "Network Pruning Network" Approach to Deep Model Compression	Jan 15, 2020	Knowledge DistillationModel Compression	—Unverified
Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification	Jan 15, 2020	Knowledge DistillationObject	—Unverified
Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation	Jan 14, 2020	Knowledge Distillation	—Unverified
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search	Jan 13, 2020	Knowledge DistillationNeural Architecture Search	CodeCode Available
Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion	Jan 1, 2020	Knowledge Distillation	—Unverified
Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation	Dec 31, 2019	Knowledge Distillation	—Unverified
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier	Dec 27, 2019	Data-free Knowledge DistillationIncremental Learning	—Unverified
Data-Free Adversarial Distillation	Dec 23, 2019	Knowledge DistillationModel Compression	CodeCode Available
The State of Knowledge Distillation for Classification	Dec 20, 2019	ClassificationData Augmentation	CodeCode Available
Joint Architecture and Knowledge Distillation in CNN for Chinese Text Recognition	Dec 17, 2019	Handwritten Chinese Text RecognitionKnowledge Distillation	—Unverified
Iterative Dual Domain Adaptation for Neural Machine Translation	Dec 16, 2019	Domain AdaptationKnowledge Distillation	—Unverified
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation	Dec 6, 2019	Data AugmentationKnowledge Distillation	—Unverified
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation	Dec 4, 2019	General KnowledgeKnowledge Distillation	—Unverified
QUEST: Quantized embedding space for transferring knowledge	Dec 3, 2019	Knowledge Distillation	CodeCode Available
Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation	Dec 2, 2019	2D Pose EstimationDomain Adaptation	—Unverified

Show:10 25 50

← PrevPage 80 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified