Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4001–4050 of 4240 papers

Title	Date	Tasks	Status	Hype
Understanding Knowledge Distillation in Non-autoregressive Machine Translation	Nov 7, 2019	Knowledge DistillationMachine Translation	—Unverified	0
Data Diversification: A Simple Strategy For Neural Machine Translation	Nov 5, 2019	Knowledge DistillationMachine Translation	CodeCode Available	1
ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper	Nov 1, 2019	AllKnowledge Distillation	—Unverified	0
Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation	Nov 1, 2019	ClassificationCross-Lingual Transfer	—Unverified	0
Natural Language Generation for Effective Knowledge Distillation	Nov 1, 2019	Knowledge DistillationLinguistic Acceptability	CodeCode Available	0
Distilling Pixel-Wise Feature Similarities for Semantic Segmentation	Oct 31, 2019	Knowledge DistillationNeural Network Compression	—Unverified	0
A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems	Oct 28, 2019	dialog state trackingDialogue State Tracking	—Unverified	0
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization	Oct 27, 2019	Knowledge DistillationVideo Understanding	CodeCode Available	0
Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework	Oct 26, 2019	Knowledge DistillationVariational Inference	—Unverified	0
Secost: Sequential co-supervision for large scale weakly labeled audio event detection	Oct 25, 2019	Event DetectionKnowledge Distillation	—Unverified	0
An Empirical Study of Efficient ASR Rescoring with Transformers	Oct 24, 2019	Knowledge DistillationLanguage Modeling	—Unverified	0
Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning	Oct 24, 2019	Continual Learningimage-classification	—Unverified	0
Contrastive Representation Distillation	Oct 23, 2019	Contrastive LearningKnowledge Distillation	CodeCode Available	1
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System	Oct 18, 2019	General KnowledgeKnowledge Distillation	—Unverified	0
A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone	Oct 16, 2019	Gaze EstimationKnowledge Distillation	—Unverified	0
Noise as a Resource for Learning in Knowledge Distillation	Oct 11, 2019	Knowledge Distillation	—Unverified	0
VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition	Oct 11, 2019	Face DetectionFace Identification	CodeCode Available	0
Cross-modal knowledge distillation for action recognition	Oct 10, 2019	Action RecognitionKnowledge Distillation	—Unverified	0
FedMD: Heterogenous Federated Learning via Model Distillation	Oct 8, 2019	Federated LearningKnowledge Distillation	CodeCode Available	1
Knowledge Distillation from Internal Representations	Oct 8, 2019	Knowledge Distillation	—Unverified	0
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data	Oct 4, 2019	Knowledge DistillationNER	—Unverified	0
On the Efficacy of Knowledge Distillation	Oct 3, 2019	Knowledge Distillation	—Unverified	0
Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition	Oct 2, 2019	Knowledge DistillationLanguage Modeling	—Unverified	0
AntMan: Sparse Low-Rank Compression to Accelerate RNN inference	Oct 2, 2019	Knowledge DistillationLow-rank compression	—Unverified	0
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	Oct 2, 2019	Hate Speech DetectionKnowledge Distillation	CodeCode Available	1
Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems	Oct 1, 2019	Edge-computingImage Classification	CodeCode Available	1
A Bayesian Optimization Framework for Neural Network Compression	Oct 1, 2019	Bayesian OptimizationKnowledge Distillation	—Unverified	0
Distillation-Based Training for Multi-Exit Architectures	Oct 1, 2019	Knowledge Distillation	CodeCode Available	1
Training convolutional neural networks with cheap convolutions and online distillation	Sep 28, 2019	Knowledge Distillation	CodeCode Available	0
Compact Trilinear Interaction for Visual Question Answering	Sep 26, 2019	BenchmarkingKnowledge Distillation	CodeCode Available	0
Distilled embedding: non-linear embedding factorization using knowledge distillation	Sep 25, 2019	Knowledge DistillationMachine Translation	—Unverified	0
Collaborative Inter-agent Knowledge Distillation for Reinforcement Learning	Sep 25, 2019	Decision MakingKnowledge Distillation	—Unverified	0
Proactive Sequence Generator via Knowledge Acquisition	Sep 25, 2019	de-enKnowledge Distillation	—Unverified	0
XD: Cross-lingual Knowledge Distillation for Polyglot Sentence Embeddings	Sep 25, 2019	Knowledge DistillationLanguage Modeling	—Unverified	0
SELF-KNOWLEDGE DISTILLATION ADVERSARIAL ATTACK	Sep 25, 2019	Adversarial AttackKnowledge Distillation	—Unverified	0
Revisiting Knowledge Distillation via Label Smoothing Regularization	Sep 25, 2019	Knowledge DistillationSelf-Knowledge Distillation	CodeCode Available	0
Extremely Small BERT Models from Mixed-Vocabulary Training	Sep 25, 2019	Knowledge DistillationLanguage Modelling	—Unverified	0
Technical report on Conversational Question Answering	Sep 24, 2019	Conversational Question AnsweringData Augmentation	—Unverified	0
FEED: Feature-level Ensemble for Knowledge Distillation	Sep 24, 2019	Knowledge Distillation	—Unverified	0
TinyBERT: Distilling BERT for Natural Language Understanding	Sep 23, 2019	Knowledge DistillationLanguage Modelling	CodeCode Available	0
Positive-Unlabeled Compression on the Cloud	Sep 21, 2019	GPUKnowledge Distillation	CodeCode Available	2
Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation	Sep 20, 2019	Knowledge DistillationPedestrian Detection	—Unverified	0
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks	Sep 17, 2019	Ensemble LearningGeneral Classification	CodeCode Available	0
Knowledge Transfer Graph for Deep Collaborative Learning	Sep 10, 2019	Knowledge DistillationTransfer Learning	CodeCode Available	0
Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network	Sep 5, 2019	DecoderKnowledge Distillation	—Unverified	0
Knowledge distillation for optimization of quantized deep neural networks	Sep 4, 2019	Knowledge Distillation	—Unverified	0
Knowledge Distillation for End-to-End Person Search	Sep 3, 2019	Knowledge DistillationModel Compression	CodeCode Available	0
Online Sensor Hallucination via Knowledge Distillation for Multimodal Image Classification	Aug 28, 2019	ClassificationDecision Making	—Unverified	0
Patient Knowledge Distillation for BERT Model Compression	Aug 25, 2019	Knowledge Distillationmodel	CodeCode Available	0
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models	Aug 23, 2019	Knowledge DistillationLanguage Modelling	CodeCode Available	2

Show:10 25 50

← PrevPage 81 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified