Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3401–3450 of 4240 papers

Title	Date	Tasks	Status
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher	Oct 16, 2021	image-classificationImage Classification	—Unverified
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding	Oct 16, 2021	Knowledge DistillationModel Compression	—Unverified
A Short Study on Compressing Decoder-Based Language Models	Oct 16, 2021	DecoderKnowledge Distillation	—Unverified
Know your tools well: Better and faster QA with synthetic examples	Oct 16, 2021	DiversityKnowledge Distillation	—Unverified
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm	Oct 15, 2021	Knowledge Distillation	—Unverified
From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation	Oct 15, 2021	Knowledge DistillationMultimodal Deep Learning	—Unverified
Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?	Oct 15, 2021	Knowledge DistillationMachine Translation	—Unverified
Kronecker Decomposition for GPT Compression	Oct 15, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Language Modelling via Learning to Rank	Oct 13, 2021	Knowledge DistillationLanguage Modelling	—Unverified
False Negative Distillation and Contrastive Learning for Personalized Outfit Recommendation	Oct 13, 2021	Contrastive LearningData Augmentation	—Unverified
CONetV2: Efficient Auto-Channel Size Optimization for CNNs	Oct 13, 2021	Knowledge DistillationNeural Architecture Search	CodeCode Available
Compact CNN Models for On-device Ocular-based User Recognition in Mobile Devices	Oct 11, 2021	Knowledge DistillationNetwork Pruning	—Unverified
Rectifying the Data Bias in Knowledge Distillation	Oct 11, 2021	Face RecognitionFace Verification	—Unverified
Towards Streaming Egocentric Action Anticipation	Oct 11, 2021	Action AnticipationKnowledge Distillation	—Unverified
Towards Data-Free Domain Generalization	Oct 9, 2021	Data-free Knowledge DistillationDomain Generalization	CodeCode Available
Visualizing the embedding space to explain the effect of knowledge distillation	Oct 9, 2021	Knowledge Distillation	—Unverified
Cross-modal Knowledge Distillation for Vision-to-Sensor Action Recognition	Oct 8, 2021	Action RecognitionActivity Recognition	CodeCode Available
Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models	Oct 7, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Peer Collaborative Learning for Polyphonic Sound Event Detection	Oct 7, 2021	Event DetectionKnowledge Distillation	—Unverified
Online Hyperparameter Meta-Learning with Hypergradient Distillation	Oct 6, 2021	Hyperparameter OptimizationKnowledge Distillation	—Unverified
Inter-Domain Alignment for Predicting High-Resolution Brain Networks Using Teacher-Student Learning	Oct 6, 2021	DecoderDomain Adaptation	CodeCode Available
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis	Oct 4, 2021	Knowledge DistillationSpeech Synthesis	—Unverified
Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation	Oct 1, 2021	Knowledge DistillationSelf-Knowledge Distillation	CodeCode Available
Deep Neural Compression Via Concurrent Pruning and Self-Distillation	Sep 30, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Improving Neural Ranking via Lossless Knowledge Distillation	Sep 30, 2021	Knowledge DistillationLearning-To-Rank	—Unverified
Automated Channel Pruning with Learned Importance	Sep 29, 2021	DenoisingGPU	—Unverified
Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data	Sep 29, 2021	Image GenerationKnowledge Distillation	—Unverified
Explaining Knowledge Graph Embedding via Latent Rule Learning	Sep 29, 2021	Graph EmbeddingKnowledge Distillation	—Unverified
SeqPATE: Differentially Private Text Generation via Knowledge Distillation	Sep 29, 2021	Knowledge DistillationSentence	—Unverified
Not All Regions are Worthy to be Distilled: Region-aware Knowledge Distillation Towards Efficient Image-to-Image Translation	Sep 29, 2021	AllContrastive Learning	—Unverified
Scaling Fair Learning to Hundreds of Intersectional Groups	Sep 29, 2021	AttributeFairness	—Unverified
Self-Slimming Vision Transformer	Sep 29, 2021	Knowledge Distillation	—Unverified
Convolutional Neural Network Compression through Generalized Kronecker Product Decomposition	Sep 29, 2021	image-classificationImage Classification	—Unverified
Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation	Sep 29, 2021	Knowledge Distillation	—Unverified
Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation	Sep 29, 2021	Few-Shot LearningKnowledge Distillation	—Unverified
To Smooth or not to Smooth? On Compatibility between Label Smoothing and Knowledge Distillation	Sep 29, 2021	image-classificationImage Classification	—Unverified
Adaptive Label Smoothing with Self-Knowledge	Sep 29, 2021	Knowledge DistillationMachine Translation	—Unverified
Representation Consolidation from Multiple Expert Teachers	Sep 29, 2021	Knowledge Distillation	—Unverified
Source-Target Unified Knowledge Distillation for Memory-Efficient Federated Domain Adaptation on Edge Devices	Sep 29, 2021	Domain AdaptationKnowledge Distillation	—Unverified
Wakening Past Concepts without Past Data: Class-incremental Learning from Placebos	Sep 29, 2021	class-incremental learningClass Incremental Learning	—Unverified
A Unified Knowledge Distillation Framework for Deep Directed Graphical Models	Sep 29, 2021	Continual LearningFederated Learning	—Unverified
Learning Efficient Image Super-Resolution Networks via Structure-Regularized Pruning	Sep 29, 2021	Image Super-ResolutionKnowledge Distillation	—Unverified
Understanding the Success of Knowledge Distillation -- A Data Augmentation Perspective	Sep 29, 2021	Active LearningData Augmentation	—Unverified
Self-supervised Models are Good Teaching Assistants for Vision Transformers	Sep 29, 2021	Image ClassificationKnowledge Distillation	—Unverified
MOBA: Multi-teacher Model Based Reinforcement Learning	Sep 29, 2021	Decision MakingKnowledge Distillation	—Unverified
Fast and Efficient Once-For-All Networks for Diverse Hardware Deployment	Sep 29, 2021	AllGPU	—Unverified
Self-Distilled Pruning Of Neural Networks	Sep 29, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Exploiting Knowledge Distillation for Few-Shot Image Generation	Sep 29, 2021	DiversityImage Generation	—Unverified
A Comprehensive Overhaul of Distilling Unconditional GANs	Sep 29, 2021	Knowledge Distillation	—Unverified
Reducing the Teacher-Student Gap via Adaptive Temperatures	Sep 29, 2021	Knowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 69 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified