Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1901–1950 of 4240 papers

Title	Date	Tasks	Status	Hype
Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages	May 25, 2023	Knowledge DistillationMachine Translation	—Unverified	0
OVO: Open-Vocabulary Occupancy	May 25, 2023	Knowledge DistillationPrediction	CodeCode Available	1
Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments	May 25, 2023	Continual LearningContinual Semantic Segmentation	—Unverified	0
Collective Knowledge Graph Completion with Mutual Knowledge Distillation	May 25, 2023	Knowledge DistillationKnowledge Graph Completion	—Unverified	0
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data	May 25, 2023	Knowledge DistillationSpeech Extraction	—Unverified	0
Triplet Knowledge Distillation	May 25, 2023	Face Recognitionimage-classification	—Unverified	0
Camera-Incremental Object Re-Identification with Identity Knowledge Evolution	May 25, 2023	Knowledge DistillationObject	CodeCode Available	0
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives	May 24, 2023	Knowledge DistillationQNLI	CodeCode Available	1
HARD: Hard Augmentations for Robust Distillation	May 24, 2023	Data AugmentationDomain Generalization	—Unverified	0
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation	May 24, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition	May 24, 2023	DenoisingKnowledge Distillation	CodeCode Available	2
Deakin RF-Sensing: Experiments on Correlated Knowledge Distillation for Monitoring Human Postures with Radios	May 24, 2023	Knowledge Distillation	—Unverified	0
Just CHOP: Embarrassingly Simple LLM Compression	May 24, 2023	Knowledge DistillationLanguage Modeling	—Unverified	0
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness	May 24, 2023	Adversarial RobustnessKnowledge Distillation	—Unverified	0
PruMUX: Augmenting Data Multiplexing with Model Compression	May 24, 2023	Knowledge Distillationmodel	CodeCode Available	0
NORM: Knowledge Distillation via N-to-One Representation Matching	May 23, 2023	Knowledge Distillation	CodeCode Available	1
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding	May 23, 2023	Continual LearningDecoder	—Unverified	0
One-stop Training of Multiple Capacity Models	May 23, 2023	Knowledge DistillationMachine Translation	—Unverified	0
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation	May 23, 2023	DenoisingKnowledge Distillation	—Unverified	0
Decoupled Kullback-Leibler Divergence Loss	May 23, 2023	Adversarial DefenseAdversarial Robustness	CodeCode Available	1
Transferring Learning Trajectories of Neural Networks	May 23, 2023	Knowledge Distillation	—Unverified	0
EnSiam: Self-Supervised Learning With Ensemble Representations	May 22, 2023	Contrastive LearningKnowledge Distillation	—Unverified	0
Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation	May 22, 2023	Data AugmentationKnowledge Distillation	—Unverified	0
Lion: Adversarial Distillation of Proprietary Large Language Models	May 22, 2023	Instruction FollowingKnowledge Distillation	CodeCode Available	2
Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?	May 22, 2023	Data-free Knowledge DistillationFew-Shot Learning	CodeCode Available	1
D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization	May 22, 2023	Knowledge Distillation	CodeCode Available	0
Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study	May 22, 2023	Data AugmentationKnowledge Distillation	—Unverified	0
Understanding the Effect of Data Augmentation on Knowledge Distillation	May 21, 2023	Data AugmentationKnowledge Distillation	—Unverified	0
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding	May 21, 2023	Data AugmentationDecoder	—Unverified	0
One-Shot Federated Learning for LEO Constellations that Reduces Convergence Time from Days to 90 Minutes	May 21, 2023	Federated LearningKnowledge Distillation	—Unverified	0
DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining	May 20, 2023	Extractive SummarizationKnowledge Distillation	CodeCode Available	1
Lifting the Curse of Capacity Gap in Distilling Language Models	May 20, 2023	Knowledge DistillationMixture-of-Experts	CodeCode Available	1
Accurate Knowledge Distillation with n-best Reranking	May 20, 2023	Knowledge DistillationReranking	—Unverified	0
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding	May 20, 2023	Knowledge DistillationSentence	CodeCode Available	0
Pseudo-Label Training and Model Inertia in Neural Machine Translation	May 19, 2023	Knowledge DistillationMachine Translation	—Unverified	0
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization	May 18, 2023	BenchmarkingGPU	—Unverified	0
BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval	May 18, 2023	Information RetrievalKnowledge Distillation	—Unverified	0
Cross-modality Data Augmentation for End-to-End Sign Language Translation	May 18, 2023	Data AugmentationKnowledge Distillation	CodeCode Available	1
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition	May 18, 2023	Knowledge DistillationQuantization	—Unverified	0
Student-friendly Knowledge Distillation	May 18, 2023	Knowledge Distillation	—Unverified	0
Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling	May 18, 2023	Knowledge Distillation	CodeCode Available	0
AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression	May 17, 2023	Knowledge DistillationLanguage Modeling	CodeCode Available	1
When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario	May 17, 2023	Knowledge Distillation	—Unverified	0
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation	May 16, 2023	Knowledge Distillationtext-classification	CodeCode Available	1
Weight-Inherited Distillation for Task-Agnostic BERT Compression	May 16, 2023	Knowledge Distillation	CodeCode Available	0
Lightweight Self-Knowledge Distillation with Multi-source Information Fusion	May 16, 2023	Knowledge DistillationSelf-Knowledge Distillation	CodeCode Available	0
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1
Soft Prompt Decoding for Multilingual Dense Retrieval	May 15, 2023	Cross-Lingual Information RetrievalInformation Retrieval	—Unverified	0
Distilling Knowledge for Short-to-Long Term Trajectory Prediction	May 15, 2023	Knowledge DistillationPrediction	—Unverified	0
Improving Defensive Distillation using Teacher Assistant	May 14, 2023	Face RecognitionKnowledge Distillation	—Unverified	0

Show:10 25 50

← PrevPage 39 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified