Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3551–3600 of 4240 papers

Title	Date	Tasks	Status	Hype
Adversarially Robust and Explainable Model Compression with On-Device Personalization for Text Classification	Jan 10, 2021	Adversarial RobustnessGeneral Classification	—Unverified	0
Spending Your Winning Lottery Better After Drawing It	Jan 8, 2021	Knowledge Distillation	CodeCode Available	0
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed	Jan 7, 2021	DenoisingImage Generation	CodeCode Available	1
MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding	Jan 6, 2021	Knowledge DistillationMeta-Learning	—Unverified	0
Label Augmentation via Time-based Knowledge Distillation for Financial Anomaly Detection	Jan 5, 2021	Anomaly DetectionKnowledge Distillation	—Unverified	0
Self-Mutual Distillation Learning for Continuous Sign Language Recognition	Jan 1, 2021	Knowledge DistillationSign Language Recognition	CodeCode Available	1
FLAR: A Unified Prototype Framework for Few-Sample Lifelong Active Recognition	Jan 1, 2021	Knowledge DistillationLifelong learning	—Unverified	0
Unpaired Learning for Deep Image Deraining With Rain Direction Regularizer	Jan 1, 2021	Knowledge DistillationRain Removal	—Unverified	0
Kernel Methods in Hyperbolic Spaces	Jan 1, 2021	Few-Shot Learningimage-classification	—Unverified	0
Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation	Jan 1, 2021	DiversityKnowledge Distillation	CodeCode Available	1
Active Learning for Lane Detection: A Knowledge Distillation Approach	Jan 1, 2021	2D Object DetectionActive Learning	—Unverified	0
Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher	Jan 1, 2021	image-classificationImage Classification	—Unverified	0
Improving De-Raining Generalization via Neural Reorganization	Jan 1, 2021	Knowledge Distillation	—Unverified	0
Distilling Global and Local Logits With Densely Connected Relations	Jan 1, 2021	image-classificationImage Classification	CodeCode Available	0
Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective	Jan 1, 2021	Knowledge Distillation	—Unverified	0
Disentanglement, Visualization and Analysis of Complex Features in DNNs	Jan 1, 2021	DisentanglementKnowledge Distillation	—Unverified	0
Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors	Jan 1, 2021	image-classificationImage Classification	CodeCode Available	1
Can Students Outperform Teachers in Knowledge Distillation based Model Compression?	Jan 1, 2021	Knowledge DistillationModel Compression	—Unverified	0
Contextual Knowledge Distillation for Transformer Compression	Jan 1, 2021	Knowledge DistillationLanguage Modeling	—Unverified	0
Explicit Connection Distillation	Jan 1, 2021	image-classificationImage Classification	—Unverified	0
Knowledge distillation via softmax regression representation learning	Jan 1, 2021	Knowledge DistillationModel Compression	—Unverified	0
Knowledge Distillation based Ensemble Learning for Neural Machine Translation	Jan 1, 2021	Ensemble LearningKnowledge Distillation	—Unverified	0
Learning from deep model via exploring local targets	Jan 1, 2021	Knowledge Distillationmodel	—Unverified	0
Understanding Adversarial Attacks on Autoencoders	Jan 1, 2021	Compressive SensingKnowledge Distillation	—Unverified	0
Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning	Jan 1, 2021	class-incremental learningClass Incremental Learning	—Unverified	0
Robust Overfitting may be mitigated by properly learned smoothening	Jan 1, 2021	Knowledge Distillation	—Unverified	0
Don't be picky, all students in the right family can learn from good teachers	Jan 1, 2021	AllBayesian Optimization	—Unverified	0
Understanding Knowledge Distillation	Jan 1, 2021	Knowledge Distillation	—Unverified	0
Unified Mandarin TTS Front-end Based on Distilled BERT Model	Dec 31, 2020	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Towards Zero-Shot Knowledge Distillation for Natural Language Processing	Dec 31, 2020	Knowledge DistillationModel Compression	—Unverified	0
Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation	Dec 31, 2020	Knowledge DistillationMachine Translation	—Unverified	0
Knowledge Distillation with Adaptive Asymmetric Label Sharpening for Semi-supervised Fracture Detection in Chest X-rays	Dec 30, 2020	Fracture detectionKnowledge Distillation	—Unverified	0
Understanding and Improving Lexical Choice in Non-Autoregressive Translation	Dec 29, 2020	Knowledge DistillationTranslation	—Unverified	0
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade	Dec 29, 2020	Knowledge DistillationModel Selection	CodeCode Available	1
Learning Light-Weight Translation Models from Deep Transformer	Dec 27, 2020	Knowledge DistillationMachine Translation	CodeCode Available	1
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation	Dec 27, 2020	Knowledge Distillation	—Unverified	0
Towards a Universal Continuous Knowledge Base	Dec 25, 2020	Knowledge Distillationtext-classification	—Unverified	0
Future-Guided Incremental Transformer for Simultaneous Translation	Dec 23, 2020	Knowledge DistillationTranslation	—Unverified	0
AttentionLite: Towards Efficient Self-Attention Models for Vision	Dec 21, 2020	Knowledge Distillation	—Unverified	0
Diverse Knowledge Distillation for End-to-End Person Search	Dec 21, 2020	Human DetectionKnowledge Distillation	—Unverified	0
Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup	Dec 17, 2020	InformativenessKnowledge Distillation	CodeCode Available	1
Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation	Dec 17, 2020	3D Human Pose EstimationKnowledge Distillation	CodeCode Available	1
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning	Dec 17, 2020	Deep LearningKnowledge Distillation	—Unverified	0
Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces	Dec 16, 2020	GPUKnowledge Distillation	—Unverified	0
Wasserstein Contrastive Representation Distillation	Dec 15, 2020	Contrastive LearningKnowledge Distillation	—Unverified	0
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding	Dec 14, 2020	Contrastive LearningKnowledge Distillation	—Unverified	0
Periocular Embedding Learning with Consistent Knowledge Distillation from Face	Dec 12, 2020	Knowledge DistillationPrediction	—Unverified	0
Improving Task-Agnostic BERT Distillation with Layer Mapping Search	Dec 11, 2020	Knowledge Distillation	—Unverified	0
Reinforced Multi-Teacher Selection for Knowledge Distillation	Dec 11, 2020	GPUKnowledge Distillation	—Unverified	0
Large-Scale Generative Data-Free Distillation	Dec 10, 2020	Knowledge DistillationModel Compression	—Unverified	0

Show:10 25 50

← PrevPage 72 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified