Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2501–2550 of 4240 papers

Title	Date	Tasks	Status
Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages	May 25, 2023	Knowledge DistillationMachine Translation	—Unverified
Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments	May 25, 2023	Continual LearningContinual Semantic Segmentation	—Unverified
Triplet Knowledge Distillation	May 25, 2023	Face Recognitionimage-classification	—Unverified
Collective Knowledge Graph Completion with Mutual Knowledge Distillation	May 25, 2023	Knowledge DistillationKnowledge Graph Completion	—Unverified
On the Impact of Knowledge Distillation for Model Interpretability	May 25, 2023	Knowledge Distillation	—Unverified
PruMUX: Augmenting Data Multiplexing with Model Compression	May 24, 2023	Knowledge Distillationmodel	CodeCode Available
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation	May 24, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Deakin RF-Sensing: Experiments on Correlated Knowledge Distillation for Monitoring Human Postures with Radios	May 24, 2023	Knowledge Distillation	—Unverified
HARD: Hard Augmentations for Robust Distillation	May 24, 2023	Data AugmentationDomain Generalization	—Unverified
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness	May 24, 2023	Adversarial RobustnessKnowledge Distillation	—Unverified
Just CHOP: Embarrassingly Simple LLM Compression	May 24, 2023	Knowledge DistillationLanguage Modeling	—Unverified
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation	May 23, 2023	DenoisingKnowledge Distillation	—Unverified
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding	May 23, 2023	Continual LearningDecoder	—Unverified
Transferring Learning Trajectories of Neural Networks	May 23, 2023	Knowledge Distillation	—Unverified
One-stop Training of Multiple Capacity Models	May 23, 2023	Knowledge DistillationMachine Translation	—Unverified
D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization	May 22, 2023	Knowledge Distillation	CodeCode Available
EnSiam: Self-Supervised Learning With Ensemble Representations	May 22, 2023	Contrastive LearningKnowledge Distillation	—Unverified
Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation	May 22, 2023	Data AugmentationKnowledge Distillation	—Unverified
Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study	May 22, 2023	Data AugmentationKnowledge Distillation	—Unverified
One-Shot Federated Learning for LEO Constellations that Reduces Convergence Time from Days to 90 Minutes	May 21, 2023	Federated LearningKnowledge Distillation	—Unverified
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding	May 21, 2023	Data AugmentationDecoder	—Unverified
Understanding the Effect of Data Augmentation on Knowledge Distillation	May 21, 2023	Data AugmentationKnowledge Distillation	—Unverified
Accurate Knowledge Distillation with n-best Reranking	May 20, 2023	Knowledge DistillationReranking	—Unverified
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding	May 20, 2023	Knowledge DistillationSentence	CodeCode Available
Pseudo-Label Training and Model Inertia in Neural Machine Translation	May 19, 2023	Knowledge DistillationMachine Translation	—Unverified
Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling	May 18, 2023	Knowledge Distillation	CodeCode Available
BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval	May 18, 2023	Information RetrievalKnowledge Distillation	—Unverified
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition	May 18, 2023	Knowledge DistillationQuantization	—Unverified
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization	May 18, 2023	BenchmarkingGPU	—Unverified
Student-friendly Knowledge Distillation	May 18, 2023	Knowledge Distillation	—Unverified
When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario	May 17, 2023	Knowledge Distillation	—Unverified
Lightweight Self-Knowledge Distillation with Multi-source Information Fusion	May 16, 2023	Knowledge DistillationSelf-Knowledge Distillation	CodeCode Available
Weight-Inherited Distillation for Task-Agnostic BERT Compression	May 16, 2023	Knowledge Distillation	CodeCode Available
Distilling Knowledge for Short-to-Long Term Trajectory Prediction	May 15, 2023	Knowledge DistillationPrediction	—Unverified
Soft Prompt Decoding for Multilingual Dense Retrieval	May 15, 2023	Cross-Lingual Information RetrievalInformation Retrieval	—Unverified
Improving Defensive Distillation using Teacher Assistant	May 14, 2023	Face RecognitionKnowledge Distillation	—Unverified
On enhancing the robustness of Vision Transformers: Defensive Diffusion	May 14, 2023	Computational EfficiencyDenoising	CodeCode Available
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation	May 14, 2023	Knowledge DistillationMachine Translation	CodeCode Available
Analyzing Compression Techniques for Computer Vision	May 14, 2023	Knowledge DistillationQuantization	—Unverified
GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples	May 13, 2023	BinarizationKnowledge Distillation	CodeCode Available
Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation	May 13, 2023	Domain AdaptationKnowledge Distillation	—Unverified
AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference	May 13, 2023	Knowledge Distillation	—Unverified
Knowledge distillation with Segment Anything (SAM) model for Planetary Geological Mapping	May 12, 2023	DecoderImage Segmentation	—Unverified
A Lightweight Domain Adversarial Neural Network Based on Knowledge Distillation for EEG-based Cross-subject Emotion Recognition	May 12, 2023	EEGElectroencephalogram (EEG)	—Unverified
Long-Tailed Question Answering in an Open World	May 11, 2023	Knowledge DistillationLanguage Modelling	—Unverified
A Survey on the Robustness of Computer Vision Models against Common Corruptions	May 10, 2023	Data AugmentationKnowledge Distillation	CodeCode Available
Explainable Knowledge Distillation for On-device Chest X-Ray Classification	May 10, 2023	Explainable artificial intelligenceExplainable Artificial Intelligence (XAI)	—Unverified
DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing	May 9, 2023	Knowledge Distillation	—Unverified
SRIL: Selective Regularization for Class-Incremental Learning	May 9, 2023	class-incremental learningClass Incremental Learning	—Unverified
Multi-Teacher Knowledge Distillation For Text Image Machine Translation	May 9, 2023	DecoderKnowledge Distillation	CodeCode Available

Show:10 25 50

← PrevPage 51 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified