Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4051–4100 of 4240 papers

Title	Date	Tasks	Status
Revisiting Knowledge Distillation via Label Smoothing Regularization	Sep 25, 2019	Knowledge DistillationSelf-Knowledge Distillation	CodeCode Available
XD: Cross-lingual Knowledge Distillation for Polyglot Sentence Embeddings	Sep 25, 2019	Knowledge DistillationLanguage Modeling	—Unverified
Extremely Small BERT Models from Mixed-Vocabulary Training	Sep 25, 2019	Knowledge DistillationLanguage Modelling	—Unverified
Technical report on Conversational Question Answering	Sep 24, 2019	Conversational Question AnsweringData Augmentation	—Unverified
FEED: Feature-level Ensemble for Knowledge Distillation	Sep 24, 2019	Knowledge Distillation	—Unverified
TinyBERT: Distilling BERT for Natural Language Understanding	Sep 23, 2019	Knowledge DistillationLanguage Modelling	CodeCode Available
Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation	Sep 20, 2019	Knowledge DistillationPedestrian Detection	—Unverified
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks	Sep 17, 2019	Ensemble LearningGeneral Classification	CodeCode Available
Knowledge Transfer Graph for Deep Collaborative Learning	Sep 10, 2019	Knowledge DistillationTransfer Learning	CodeCode Available
Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network	Sep 5, 2019	DecoderKnowledge Distillation	—Unverified
Knowledge distillation for optimization of quantized deep neural networks	Sep 4, 2019	Knowledge Distillation	—Unverified
Knowledge Distillation for End-to-End Person Search	Sep 3, 2019	Knowledge DistillationModel Compression	CodeCode Available
Online Sensor Hallucination via Knowledge Distillation for Multimodal Image Classification	Aug 28, 2019	ClassificationDecision Making	—Unverified
Patient Knowledge Distillation for BERT Model Compression	Aug 25, 2019	Knowledge Distillationmodel	CodeCode Available
Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement	Aug 22, 2019	Knowledge DistillationMissing Labels	—Unverified
Language Graph Distillation for Low-Resource Machine Translation	Aug 17, 2019	Knowledge DistillationMachine Translation	—Unverified
Knowledge distillation for semi-supervised domain adaptation	Aug 16, 2019	Domain AdaptationKnowledge Distillation	—Unverified
Adaptive Regularization of Labels	Aug 15, 2019	Data AugmentationKnowledge Distillation	—Unverified
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding	Aug 14, 2019	Knowledge DistillationNatural Language Understanding	CodeCode Available
Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations	Aug 10, 2019	Knowledge DistillationQuantization	—Unverified
Knowledge Consistency between Neural Networks and Beyond	Aug 5, 2019	Knowledge Distillation	—Unverified
Learning Lightweight Lane Detection CNNs by Self Attention Distillation	Aug 2, 2019	Knowledge DistillationLane Detection	CodeCode Available
Self-Knowledge Distillation in Natural Language Processing	Aug 2, 2019	Deep LearningKnowledge Distillation	—Unverified
GTCOM Neural Machine Translation Systems for WMT19	Aug 1, 2019	Knowledge DistillationLanguage Modeling	—Unverified
The NiuTrans Machine Translation Systems for WMT19	Aug 1, 2019	Knowledge DistillationMachine Translation	—Unverified
Baidu Neural Machine Translation Systems for WMT19	Aug 1, 2019	Data AugmentationDomain Adaptation	—Unverified
PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation	Aug 1, 2019	Knowledge DistillationRe-Ranking	—Unverified
Distill-to-Label: Weakly Supervised Instance Labeling Using Knowledge Distillation	Jul 26, 2019	Breast Cancer DetectionInstance Segmentation	—Unverified
Distilled Siamese Networks for Visual Tracking	Jul 24, 2019	Knowledge DistillationObject Tracking	—Unverified
Highlight Every Step: Knowledge Distillation via Collaborative Teaching	Jul 23, 2019	Knowledge Distillation	CodeCode Available
Real-Time Correlation Tracking via Joint Model Compression and Transfer	Jul 23, 2019	Computational EfficiencyCPU	CodeCode Available
Lifelong GAN: Continual Learning for Conditional Image Generation	Jul 23, 2019	Conditional Image GenerationContinual Learning	—Unverified
Similarity-Preserving Knowledge Distillation	Jul 23, 2019	Knowledge DistillationNeural Network Compression	—Unverified
Light Multi-segment Activation for Model Compression	Jul 16, 2019	Knowledge Distillationmodel	CodeCode Available
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition	Jul 13, 2019	Knowledge DistillationLanguage Modeling	—Unverified
BAM! Born-Again Multi-Task Networks for Natural Language Understanding	Jul 10, 2019	Knowledge DistillationNatural Language Understanding	CodeCode Available
Graph-based Knowledge Distillation by Multi-head Attention Network	Jul 4, 2019	Inductive BiasKnowledge Distillation	CodeCode Available
Compression of Acoustic Event Detection Models With Quantized Distillation	Jul 1, 2019	Event DetectionKnowledge Distillation	—Unverified
Reconstructing Perceived Images from Brain Activity by Visually-guided Cognitive Representation and Adversarial Learning	Jun 27, 2019	Generative Adversarial NetworkImage Reconstruction	—Unverified
Essence Knowledge Distillation for Speech Recognition	Jun 26, 2019	Knowledge Distillationspeech-recognition	—Unverified
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems	Jun 21, 2019	Dialogue EvaluationKnowledge Distillation	CodeCode Available
GAN-Knowledge Distillation for one-stage Object Detection	Jun 20, 2019	Knowledge DistillationObject	—Unverified
Membership Privacy for Machine Learning Models Through Knowledge Transfer	Jun 15, 2019	BIG-bench Machine LearningGeneral Classification	—Unverified
Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks	Jun 14, 2019	Knowledge DistillationQuantization	—Unverified
Scalable Syntax-Aware Language Models Using Knowledge Distillation	Jun 14, 2019	Knowledge DistillationLanguage Modeling	—Unverified
Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation	Jun 12, 2019	Knowledge Distillation	—Unverified
Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance	Jun 11, 2019	Incremental LearningKnowledge Distillation	—Unverified
Distilling Object Detectors with Fine-grained Feature Imitation	Jun 9, 2019	Knowledge DistillationObject	CodeCode Available
Private Deep Learning with Teacher Ensembles	Jun 5, 2019	Deep LearningEnsemble Learning	—Unverified
Deep Face Recognition Model Compression via Knowledge Transfer and Distillation	Jun 3, 2019	Face RecognitionKnowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 82 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified