Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3351–3400 of 4240 papers

Title	Date	Tasks	Status
LTD: Low Temperature Distillation for Robust Adversarial Training	Nov 3, 2021	Knowledge Distillation	—Unverified
Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval	Nov 3, 2021	Computational EfficiencyCross-Lingual Information Retrieval	—Unverified
Knowledge Cross-Distillation for Membership Privacy	Nov 2, 2021	Inference AttackKnowledge Distillation	—Unverified
Domain-Lifelong Learning for Dialogue State Tracking via Knowledge Preservation Networks	Nov 1, 2021	Dialogue State TrackingDiversity	CodeCode Available
Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation	Nov 1, 2021	Knowledge Distillation	—Unverified
deepQuest-py: Large and Distilled Models for Quality Estimation	Nov 1, 2021	Knowledge DistillationSentence	CodeCode Available
Papago’s Submission for the WMT21 Quality Estimation Shared Task	Nov 1, 2021	Knowledge DistillationMulti-Task Learning	—Unverified
HW-TSC’s Participation in the WMT 2021 Large-Scale Multilingual Translation Task	Nov 1, 2021	Knowledge DistillationTranslation	—Unverified
HW-TSC’s Participation in the WMT 2021 News Translation Shared Task	Nov 1, 2021	de-enKnowledge Distillation	—Unverified
Students Who Study Together Learn Better: On the Importance of Collective Knowledge Distillation for Domain Transfer in Fact Verification	Nov 1, 2021	Fact VerificationKnowledge Distillation	—Unverified
Limitations of Knowledge Distillation for Zero-shot Transfer Learning	Nov 1, 2021	CPUCross-Lingual Transfer	—Unverified
The NiuTrans System for the WMT 2021 Efficiency Task	Nov 1, 2021	GPUKnowledge Distillation	—Unverified
How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding	Nov 1, 2021	Adversarial RobustnessAll	—Unverified
PDALN: Progressive Domain Adaptation over a Pre-trained Model for Low-Resource Cross-Domain Named Entity Recognition	Nov 1, 2021	Cross-Domain Named Entity RecognitionData Augmentation	—Unverified
Mutual-Learning Improves End-to-End Speech Translation	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
The Mininglamp Machine Translation System for WMT21	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
Exploring Non-Autoregressive Text Style Transfer	Nov 1, 2021	Contrastive LearningKnowledge Distillation	CodeCode Available
The LMU Munich System for the WMT 2021 Large-Scale Multilingual Machine Translation Shared Task	Nov 1, 2021	Data AugmentationKnowledge Distillation	—Unverified
Multilingual Neural Machine Translation: Can Linguistic Hierarchies Help?	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
AUTOSUMM: Automatic Model Creation for Text Summarization	Nov 1, 2021	Abstractive Text SummarizationDeep Learning	—Unverified
RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation	Nov 1, 2021	Knowledge Distillation	—Unverified
TenTrans Large-Scale Multilingual Machine Translation System for WMT21	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
GAML-BERT: Improving BERT Early Exiting by Gradient Aligned Mutual Learning	Nov 1, 2021	Knowledge Distillation	—Unverified
Improving Stance Detection with Multi-Dataset Learning and Knowledge Distillation	Nov 1, 2021	Knowledge DistillationStance Detection	CodeCode Available
Efficient Machine Translation with Model Pruning and Quantization	Nov 1, 2021	CPUDecoder	—Unverified
NVIDIA NeMo’s Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21	Nov 1, 2021	Data AugmentationKnowledge Distillation	—Unverified
PP-ShiTu: A Practical Lightweight Image Recognition System	Nov 1, 2021	Face RecognitionKnowledge Distillation	CodeCode Available
Distilling Knowledge for Empathy Detection	Nov 1, 2021	Knowledge Distillation	CodeCode Available
Collaborative Learning of Bidirectional Decoders for Unsupervised Text Style Transfer	Nov 1, 2021	AttributeDecoder	CodeCode Available
Combining Curriculum Learning and Knowledge Distillation for Dialogue Generation	Nov 1, 2021	Dialogue GenerationKnowledge Distillation	—Unverified
Rethinking the Knowledge Distillation From the Perspective of Model Calibration	Oct 31, 2021	Knowledge Distillation	—Unverified
Estimating and Maximizing Mutual Information for Knowledge Distillation	Oct 29, 2021	Knowledge Distillation	—Unverified
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks	Oct 29, 2021	Knowledge DistillationModel Compression	—Unverified
Towards Model Agnostic Federated Learning Using Knowledge Distillation	Oct 28, 2021	Federated LearningKnowledge Distillation	—Unverified
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM	Oct 28, 2021	Knowledge DistillationNatural Language Understanding	—Unverified
Temporal Knowledge Distillation for On-device Audio Classification	Oct 27, 2021	Audio ClassificationClassification	—Unverified
GenURL: A General Framework for Unsupervised Representation Learning	Oct 27, 2021	Contrastive LearningDimensionality Reduction	—Unverified
Beyond Classification: Knowledge Distillation using Multi-Object Impressions	Oct 27, 2021	ClassificationKnowledge Distillation	—Unverified
Response-based Distillation for Incremental Object Detection	Oct 26, 2021	Incremental LearningKnowledge Distillation	—Unverified
MUSE: Feature Self-Distillation with Mutual Information and Self-Information	Oct 25, 2021	image-classificationImage Classification	—Unverified
Reconstructing Pruned Filters using Cheap Spatial Transformations	Oct 25, 2021	Feature CompressionKnowledge Distillation	—Unverified
X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation	Oct 24, 2021	Depth EstimationKnowledge Distillation	—Unverified
Pseudo Supervised Monocular Depth Estimation with Teacher-Student Network	Oct 22, 2021	Depth EstimationKnowledge Distillation	—Unverified
How and When Adversarial Robustness Transfers in Knowledge Distillation?	Oct 22, 2021	Adversarial RobustnessKnowledge Distillation	—Unverified
Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression	Oct 21, 2021	Knowledge DistillationModel Compression	—Unverified
Class Incremental Online Streaming Learning	Oct 20, 2021	class-incremental learningClass Incremental Learning	—Unverified
Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach	Oct 20, 2021	Knowledge DistillationLanguage Modeling	—Unverified
FedHe: Heterogeneous Models and Communication-Efficient Federated Learning	Oct 19, 2021	Federated LearningKnowledge Distillation	CodeCode Available
Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation	Oct 19, 2021	Knowledge DistillationNeural Network Compression	CodeCode Available
HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression	Oct 16, 2021	Few-Shot LearningKnowledge Distillation	CodeCode Available

Show:10 25 50

← PrevPage 68 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified