Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3551–3600 of 4240 papers

Title	Date	Tasks	Status
ESPnet-ST IWSLT 2021 Offline Speech Translation System	Jul 1, 2021	DecoderKnowledge Distillation	—Unverified
Local-Global Knowledge Distillation in Heterogeneous Federated Learning with Non-IID Data	Jun 30, 2021	Federated LearningKnowledge Distillation	—Unverified
Reward-Based 1-bit Compressed Federated Distillation on Blockchain	Jun 27, 2021	Federated LearningKnowledge Distillation	—Unverified
Learning without Forgetting for 3D Point Cloud Objects	Jun 27, 2021	Knowledge Distillation	CodeCode Available
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains	Jun 25, 2021	Knowledge Distillation	—Unverified
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation	Jun 25, 2021	Keyword SpottingKnowledge Distillation	—Unverified
Dealing with training and test segmentation mismatch: FBK@IWSLT2021	Jun 23, 2021	Action DetectionActivity Detection	—Unverified
Efficient Inference via Universal LSH Kernel	Jun 21, 2021	Knowledge DistillationQuantization	—Unverified
Knowledge Distillation via Instance-level Sequence Learning	Jun 21, 2021	General KnowledgeKnowledge Distillation	—Unverified
Positive-Unlabeled Data Purification in the Wild for Object Detection	Jun 19, 2021	Knowledge Distillationobject-detection	—Unverified
Data-Free Knowledge Distillation for Image Super-Resolution	Jun 19, 2021	Data-free Knowledge DistillationImage Super-Resolution	CodeCode Available
Space-Time Distillation for Video Super-Resolution	Jun 19, 2021	Knowledge DistillationSuper-Resolution	—Unverified
Cross Modality Knowledge Distillation for Multi-Modal Aerial View Object Classification	Jun 19, 2021	Image ClassificationKnowledge Distillation	CodeCode Available
Minimally Invasive Surgery for Sparse Neural Networks in Contrastive Manner	Jun 19, 2021	Knowledge DistillationModel Compression	—Unverified
Teacher's pet: understanding and mitigating biases in distillation	Jun 19, 2021	image-classificationImage Classification	—Unverified
Tree-Like Decision Distillation	Jun 19, 2021	Decision MakingKnowledge Distillation	—Unverified
CapsuleRRT: Relationships-Aware Regression Tracking via Capsules	Jun 19, 2021	image-classificationImage Classification	—Unverified
Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation	Jun 18, 2021	Knowledge DistillationMachine Translation	—Unverified
Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay	Jun 17, 2021	class-incremental learningClass Incremental Learning	—Unverified
Knowledge distillation from multi-modal to mono-modal segmentation networks	Jun 17, 2021	Brain Tumor SegmentationImage Segmentation	—Unverified
Dynamic Knowledge Distillation With Noise Elimination for RGB-D Salient Object Detection	Jun 17, 2021	Knowledge Distillationobject-detection	—Unverified
Topology Distillation for Recommender System	Jun 16, 2021	Knowledge DistillationModel Compression	—Unverified
Simon Says: Evaluating and Mitigating Bias in Pruned Neural Networks with Knowledge Distillation	Jun 15, 2021	FairnessKnowledge Distillation	CodeCode Available
CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition	Jun 14, 2021	DecoderKnowledge Distillation	—Unverified
Energy-efficient Knowledge Distillation for Spiking Neural Networks	Jun 14, 2021	Knowledge DistillationModel Compression	—Unverified
LENAS: Learning-based Neural Architecture Search and Ensemble for 3D Radiotherapy Dose Prediction	Jun 12, 2021	DiversityEnsemble Learning	CodeCode Available
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation	Jun 12, 2021	DecoderKnowledge Distillation	—Unverified
Generate, Annotate, and Learn: NLP with Synthetic Text	Jun 11, 2021	Few-Shot LearningImage Classification	CodeCode Available
RefBERT: Compressing BERT by Referencing to Pre-computed Representations	Jun 11, 2021	Knowledge Distillation	—Unverified
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation	Jun 10, 2021	Knowledge Distillation	CodeCode Available
AKE-GNN: Effective Graph Learning with Adaptive Knowledge Exchange	Jun 10, 2021	ClassificationGraph Classification	—Unverified
Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation	Jun 8, 2021	Knowledge DistillationOptical Flow Estimation	—Unverified
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models	Jun 7, 2021	Adversarial RobustnessKnowledge Distillation	—Unverified
MergeDistill: Merging Pre-trained Language Models using Distillation	Jun 5, 2021	Cross-Lingual TransferKnowledge Distillation	—Unverified
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression	Jun 4, 2021	Knowledge Distillation	CodeCode Available
Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge	Jun 2, 2021	AllKnowledge Distillation	—Unverified
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation	Jun 2, 2021	Knowledge DistillationTranslation	CodeCode Available
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers	Jun 2, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Cost-effective Deployment of BERT Models in Serverless Environment	Jun 1, 2021	Knowledge DistillationSemantic Textual Similarity	—Unverified
Modality-specific Distillation	Jun 1, 2021	Knowledge DistillationMeta-Learning	—Unverified
Multi-Grained Knowledge Distillation for Named Entity Recognition	Jun 1, 2021	Knowledge Distillationnamed-entity-recognition	—Unverified
Natural Statistics of Network Activations and Implications for Knowledge Distillation	Jun 1, 2021	Knowledge Distillation	—Unverified
Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition	Jun 1, 2021	Cross-Lingual NERKnowledge Distillation	—Unverified
Continual Learning for Neural Machine Translation	Jun 1, 2021	Continual LearningKnowledge Distillation	—Unverified
Claim Matching Beyond English to Scale Global Fact-Checking	Jun 1, 2021	Fact CheckingKnowledge Distillation	—Unverified
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing	May 31, 2021	Knowledge DistillationUnsupervised Pre-training	CodeCode Available
FReTAL: Generalizing Deepfake Detection using Knowledge Distillation and Representation Learning	May 28, 2021	DeepFake DetectionDomain Adaptation	—Unverified
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax	May 28, 2021	Data AugmentationKnowledge Distillation	CodeCode Available
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation	May 27, 2021	Knowledge DistillationNeural Architecture Search	—Unverified
Towards Understanding Knowledge Distillation	May 27, 2021	Knowledge DistillationTransfer Learning	—Unverified

Show:10 25 50

← PrevPage 72 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified