Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3351–3400 of 4240 papers

Title	Date	Tasks	Status	Hype
LENAS: Learning-based Neural Architecture Search and Ensemble for 3D Radiotherapy Dose Prediction	Jun 12, 2021	DiversityEnsemble Learning	CodeCode Available	0
RefBERT: Compressing BERT by Referencing to Pre-computed Representations	Jun 11, 2021	Knowledge Distillation	—Unverified	0
Generate, Annotate, and Learn: NLP with Synthetic Text	Jun 11, 2021	Few-Shot LearningImage Classification	CodeCode Available	0
Does Knowledge Distillation Really Work?	Jun 10, 2021	Knowledge Distillation	CodeCode Available	1
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation	Jun 10, 2021	Knowledge Distillation	CodeCode Available	0
AKE-GNN: Effective Graph Learning with Adaptive Knowledge Exchange	Jun 10, 2021	ClassificationGraph Classification	—Unverified	0
Knowledge distillation: A good teacher is patient and consistent	Jun 9, 2021	Image ClassificationKnowledge Distillation	CodeCode Available	2
Distilling Image Classifiers in Object Detectors	Jun 9, 2021	Knowledge DistillationObject	CodeCode Available	1
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation	Jun 8, 2021	Knowledge DistillationNER	CodeCode Available	1
Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation	Jun 8, 2021	Knowledge DistillationOptical Flow Estimation	—Unverified	0
BERT Learns to Teach: Knowledge Distillation with Meta Learning	Jun 8, 2021	Knowledge DistillationMeta-Learning	CodeCode Available	1
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models	Jun 7, 2021	Adversarial RobustnessKnowledge Distillation	—Unverified	0
Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model	Jun 7, 2021	Knowledge Distillation	CodeCode Available	1
Preservation of the Global Knowledge by Not-True Distillation in Federated Learning	Jun 6, 2021	Continual LearningFederated Learning	CodeCode Available	1
Bidirectional Distillation for Top-K Recommender System	Jun 5, 2021	Knowledge DistillationModel Compression	CodeCode Available	1
MergeDistill: Merging Pre-trained Language Models using Distillation	Jun 5, 2021	Cross-Lingual TransferKnowledge Distillation	—Unverified	0
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression	Jun 4, 2021	Knowledge Distillation	CodeCode Available	0
Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge	Jun 2, 2021	AllKnowledge Distillation	—Unverified	0
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation	Jun 2, 2021	Knowledge DistillationTranslation	CodeCode Available	0
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers	Jun 2, 2021	Knowledge DistillationLanguage Modeling	—Unverified	0
Modality-specific Distillation	Jun 1, 2021	Knowledge DistillationMeta-Learning	—Unverified	0
Cost-effective Deployment of BERT Models in Serverless Environment	Jun 1, 2021	Knowledge DistillationSemantic Textual Similarity	—Unverified	0
Continual Learning for Neural Machine Translation	Jun 1, 2021	Continual LearningKnowledge Distillation	—Unverified	0
Multi-Grained Knowledge Distillation for Named Entity Recognition	Jun 1, 2021	Knowledge Distillationnamed-entity-recognition	—Unverified	0
Towards Quantifiable Dialogue Coherence Evaluation	Jun 1, 2021	Coherence EvaluationDialogue Evaluation	CodeCode Available	1
Claim Matching Beyond English to Scale Global Fact-Checking	Jun 1, 2021	Fact CheckingKnowledge Distillation	—Unverified	0
Natural Statistics of Network Activations and Implications for Knowledge Distillation	Jun 1, 2021	Knowledge Distillation	—Unverified	0
Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition	Jun 1, 2021	Cross-Lingual NERKnowledge Distillation	—Unverified	0
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing	May 31, 2021	Knowledge DistillationUnsupervised Pre-training	CodeCode Available	0
Transformer-Based Source-Free Domain Adaptation	May 28, 2021	Domain AdaptationKnowledge Distillation	CodeCode Available	1
Knowledge Inheritance for Pre-trained Language Models	May 28, 2021	Domain AdaptationKnowledge Distillation	CodeCode Available	1
FReTAL: Generalizing Deepfake Detection using Knowledge Distillation and Representation Learning	May 28, 2021	DeepFake DetectionDomain Adaptation	—Unverified	0
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax	May 28, 2021	Data AugmentationKnowledge Distillation	CodeCode Available	0
Fair Feature Distillation for Visual Recognition	May 27, 2021	FairnessKnowledge Distillation	—Unverified	0
Towards Understanding Knowledge Distillation	May 27, 2021	Knowledge DistillationTransfer Learning	—Unverified	0
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?	May 27, 2021	DiversityKnowledge Distillation	—Unverified	0
Selective Knowledge Distillation for Neural Machine Translation	May 27, 2021	Knowledge DistillationMachine Translation	CodeCode Available	1
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation	May 27, 2021	Knowledge DistillationNeural Architecture Search	—Unverified	0
Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs	May 25, 2021	AttributeKnowledge Distillation	CodeCode Available	1
Real-time Monocular Depth Estimation with Sparse Supervision on Mobile	May 25, 2021	Autonomous VehiclesDepth Estimation	—Unverified	0
KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning	May 25, 2021	Deep Reinforcement LearningKnowledge Distillation	—Unverified	0
Experimenting with Knowledge Distillation techniques for performing Brain Tumor Segmentation	May 24, 2021	Brain Tumor SegmentationKnowledge Distillation	—Unverified	0
AirNet: Neural Network Transmission over the Air	May 24, 2021	Knowledge Distillation	—Unverified	0
Revisiting Knowledge Distillation for Object Detection	May 22, 2021	Domain AdaptationKnowledge Distillation	—Unverified	0
Backdoor Attacks on Self-Supervised Learning	May 21, 2021	Backdoor AttackInductive Bias	CodeCode Available	1
Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking	May 20, 2021	Document RankingKnowledge Distillation	CodeCode Available	1
Data-Free Knowledge Distillation for Heterogeneous Federated Learning	May 20, 2021	Data-free Knowledge DistillationFederated Learning	CodeCode Available	1
Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation	May 19, 2021	Image ClassificationKnowledge Distillation	CodeCode Available	1
Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching	May 18, 2021	Caption GenerationCross-Modal Retrieval	—Unverified	0
Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks	May 18, 2021	image-classificationImage Classification	—Unverified	0

Show:10 25 50

← PrevPage 68 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified