Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4001–4050 of 4240 papers

Title	Date	Tasks	Status
Random Path Selection for Continual Learning	Dec 1, 2019	Continual LearningIncremental Learning	CodeCode Available
Knowledge Extraction with No Observable Data	Dec 1, 2019	Data-free Knowledge DistillationKnowledge Distillation	CodeCode Available
Online Knowledge Distillation with Diverse Peers	Dec 1, 2019	Knowledge DistillationTransfer Learning	CodeCode Available
Towards Oracle Knowledge Distillation with Neural Architecture Search	Nov 29, 2019	image-classificationImage Classification	—Unverified
Distributed Soft Actor-Critic with Multivariate Reward Representation and Knowledge Distillation	Nov 29, 2019	Knowledge Distillationreinforcement-learning	CodeCode Available
QKD: Quantization-aware Knowledge Distillation	Nov 28, 2019	Knowledge DistillationQuantization	—Unverified
Data-Driven Compression of Convolutional Neural Networks	Nov 28, 2019	Knowledge DistillationModel Compression	—Unverified
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers	Nov 26, 2019	Knowledge DistillationLipreading	—Unverified
Few Shot Network Compression via Cross Distillation	Nov 21, 2019	Knowledge DistillationModel Compression	CodeCode Available
Search to Distill: Pearls are Everywhere but not the Eyes	Nov 20, 2019	Ensemble LearningFace Recognition	—Unverified
Neural Network Pruning with Residual-Connections and Limited-Data	Nov 19, 2019	Knowledge DistillationNetwork Pruning	CodeCode Available
Towards Making Deep Transfer Learning Never Hurt	Nov 18, 2019	AllKnowledge Distillation	—Unverified
Data Efficient Stagewise Knowledge Distillation	Nov 15, 2019	Knowledge DistillationModel Compression	CodeCode Available
Collaborative Distillation for Top-N Recommendation	Nov 13, 2019	Collaborative FilteringKnowledge Distillation	—Unverified
Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation	Nov 13, 2019	Image ClassificationKnowledge Distillation	—Unverified
Graph Representation Learning via Multi-task Knowledge Distillation	Nov 11, 2019	Graph Representation LearningKnowledge Distillation	—Unverified
Knowledge Distillation in Document Retrieval	Nov 11, 2019	Knowledge DistillationRetrieval	—Unverified
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models	Nov 9, 2019	Knowledge DistillationMulti-Task Learning	—Unverified
Knowledge Distillation for Incremental Learning in Semantic Segmentation	Nov 8, 2019	image-classificationImage Classification	—Unverified
Deep geometric knowledge distillation with graphs	Nov 8, 2019	Knowledge Distillation	CodeCode Available
Teacher-Student Training for Robust Tacotron-based TTS	Nov 7, 2019	DecoderKnowledge Distillation	—Unverified
Understanding Knowledge Distillation in Non-autoregressive Machine Translation	Nov 7, 2019	Knowledge DistillationMachine Translation	—Unverified
Microsoft Research Asia's Systems for WMT19	Nov 7, 2019	Data AugmentationKnowledge Distillation	—Unverified
Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation	Nov 1, 2019	ClassificationCross-Lingual Transfer	—Unverified
ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper	Nov 1, 2019	AllKnowledge Distillation	—Unverified
Natural Language Generation for Effective Knowledge Distillation	Nov 1, 2019	Knowledge DistillationLinguistic Acceptability	CodeCode Available
Distilling Pixel-Wise Feature Similarities for Semantic Segmentation	Oct 31, 2019	Knowledge DistillationNeural Network Compression	—Unverified
A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems	Oct 28, 2019	dialog state trackingDialogue State Tracking	—Unverified
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization	Oct 27, 2019	Knowledge DistillationVideo Understanding	CodeCode Available
Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework	Oct 26, 2019	Knowledge DistillationVariational Inference	—Unverified
Secost: Sequential co-supervision for large scale weakly labeled audio event detection	Oct 25, 2019	Event DetectionKnowledge Distillation	—Unverified
An Empirical Study of Efficient ASR Rescoring with Transformers	Oct 24, 2019	Knowledge DistillationLanguage Modeling	—Unverified
Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning	Oct 24, 2019	Continual Learningimage-classification	—Unverified
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System	Oct 18, 2019	General KnowledgeKnowledge Distillation	—Unverified
A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone	Oct 16, 2019	Gaze EstimationKnowledge Distillation	—Unverified
VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition	Oct 11, 2019	Face DetectionFace Identification	CodeCode Available
Noise as a Resource for Learning in Knowledge Distillation	Oct 11, 2019	Knowledge Distillation	—Unverified
Cross-modal knowledge distillation for action recognition	Oct 10, 2019	Action RecognitionKnowledge Distillation	—Unverified
Knowledge Distillation from Internal Representations	Oct 8, 2019	Knowledge Distillation	—Unverified
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data	Oct 4, 2019	Knowledge DistillationNER	—Unverified
On the Efficacy of Knowledge Distillation	Oct 3, 2019	Knowledge Distillation	—Unverified
AntMan: Sparse Low-Rank Compression to Accelerate RNN inference	Oct 2, 2019	Knowledge DistillationLow-rank compression	—Unverified
Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition	Oct 2, 2019	Knowledge DistillationLanguage Modeling	—Unverified
A Bayesian Optimization Framework for Neural Network Compression	Oct 1, 2019	Bayesian OptimizationKnowledge Distillation	—Unverified
Training convolutional neural networks with cheap convolutions and online distillation	Sep 28, 2019	Knowledge Distillation	CodeCode Available
Compact Trilinear Interaction for Visual Question Answering	Sep 26, 2019	BenchmarkingKnowledge Distillation	CodeCode Available
Proactive Sequence Generator via Knowledge Acquisition	Sep 25, 2019	de-enKnowledge Distillation	—Unverified
SELF-KNOWLEDGE DISTILLATION ADVERSARIAL ATTACK	Sep 25, 2019	Adversarial AttackKnowledge Distillation	—Unverified
Distilled embedding: non-linear embedding factorization using knowledge distillation	Sep 25, 2019	Knowledge DistillationMachine Translation	—Unverified
Collaborative Inter-agent Knowledge Distillation for Reinforcement Learning	Sep 25, 2019	Decision MakingKnowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 81 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified