Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3351–3400 of 4240 papers

Title	Date	Tasks	Status
Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix	Dec 21, 2021	Knowledge Distillation	—Unverified
Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition	Aug 22, 2023	Knowledge DistillationPosition	—Unverified
Multimodal Prescriptive Deep Learning	Jan 24, 2025	Deep LearningKnowledge Distillation	—Unverified
Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation	Jan 1, 2022	Autonomous DrivingDiversity	—Unverified
Multi-Person Full Body Pose Estimation	Aug 23, 2020	Knowledge DistillationMulti-Person Pose Estimation	—Unverified
Multi-perspective Contrastive Logit Distillation	Nov 16, 2024	Contrastive Learningimage-classification	—Unverified
Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation	Apr 29, 2022	DenoisingImage Denoising	—Unverified
Multi scale Feature Extraction and Fusion for Online Knowledge Distillation	Jun 16, 2022	Knowledge DistillationTransfer Learning	—Unverified
Learning to Purification for Unsupervised Person Re-identification	Apr 21, 2022	Knowledge DistillationPerson Re-Identification	—Unverified
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching	Nov 16, 2021	Contrastive LearningKnowledge Distillation	—Unverified
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition	Oct 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Multi-Strategy Knowledge Distillation Based Teacher-Student Framework for Machine Reading Comprehension	Aug 1, 2021	Knowledge DistillationMachine Reading Comprehension	—Unverified
Multitask Emotion Recognition Model with Knowledge Distillation and Task Discriminator	Mar 24, 2022	Emotion RecognitionKnowledge Distillation	—Unverified
Multi-Task Learning with Knowledge Distillation for Dense Prediction	Jan 1, 2023	Boundary DetectionDepth Estimation	—Unverified
Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification	Feb 23, 2022	ClassificationIncremental Learning	—Unverified
Multivariate Prototype Representation for Domain-Generalized Incremental Learning	Sep 24, 2023	class-incremental learningClass Incremental Learning	—Unverified
Multi-View Attention Transfer for Efficient Speech Enhancement	Aug 22, 2022	Knowledge DistillationSpeech Enhancement	—Unverified
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation	Feb 22, 2021	Dialogue GenerationGeneral Knowledge	—Unverified
Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization	Dec 19, 2022	Domain GeneralizationKnowledge Distillation	—Unverified
Multi-view knowledge distillation transformer for human action recognition	Mar 25, 2023	Action RecognitionKnowledge Distillation	—Unverified
MUSE: Feature Self-Distillation with Mutual Information and Self-Information	Oct 25, 2021	image-classificationImage Classification	—Unverified
MUST: A Multilingual Student-Teacher Learning approach for low-resource speech recognition	Oct 29, 2023	Knowledge Distillationspeech-recognition	—Unverified
Mutual Adversarial Training: Learning together is better than going alone	Dec 9, 2021	Knowledge Distillation	—Unverified
Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders	Jun 5, 2024	Knowledge DistillationSelf-Supervised Learning	—Unverified
Mutual Learning for Finetuning Click-Through Rate Prediction Models	Jun 17, 2024	Click-Through Rate PredictionKnowledge Distillation	—Unverified
Mutual-Learning Improves End-to-End Speech Translation	Nov 1, 2021	Knowledge DistillationMachine Translation	—Unverified
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization	Oct 7, 2022	Knowledge Distillationspeaker-diarization	—Unverified
Mutually-paced Knowledge Distillation for Cross-lingual Temporal Knowledge Graph Reasoning	Mar 27, 2023	Knowledge DistillationKnowledge Graphs	—Unverified
MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring	Jan 28, 2023	DiagnosticECG Classification	—Unverified
NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task	Aug 1, 2021	Knowledge DistillationMachine Translation	—Unverified
Narrowing the Coordinate-frame Gap in Behavior Prediction Models: Distillation for Efficient and Accurate Scene-centric Motion Forecasting	Jun 8, 2022	Autonomous DrivingKnowledge Distillation	—Unverified
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions	Feb 18, 2025	Knowledge DistillationMath	—Unverified
Natural Statistics of Network Activations and Implications for Knowledge Distillation	Jun 1, 2021	Knowledge Distillation	—Unverified
Nearest Neighbor Knowledge Distillation for Neural Machine Translation	Jan 16, 2022	Knowledge DistillationMachine Translation	—Unverified
Neighbourhood Distillation: On the benefits of non end-to-end distillation	Oct 2, 2020	Knowledge DistillationNeural Architecture Search	—Unverified
NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks	Nov 1, 2023	Knowledge Distillation	—Unverified
NestedNet: Learning Nested Sparse Structures in Deep Neural Networks	Dec 11, 2017	Knowledge DistillationScheduling	—Unverified
Network-Agnostic Knowledge Transfer for Medical Image Segmentation	Jan 23, 2021	Image SegmentationKnowledge Distillation	—Unverified
Reconstructing Pruned Filters using Cheap Spatial Transformations	Oct 25, 2021	Feature CompressionKnowledge Distillation	—Unverified
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models	Mar 16, 2023	CoLACPU	—Unverified
Neural Architecture Search via Ensemble-based Knowledge Distillation	Sep 29, 2021	DiversityKnowledge Distillation	—Unverified
Neural Collapse Inspired Knowledge Distillation	Dec 16, 2024	Knowledge Distillation	—Unverified
Neural Compatibility Modeling with Attentive Knowledge Distillation	Apr 17, 2018	image-classificationImage Classification	—Unverified
Neural Machine Translation from Simplified Translations	Dec 19, 2016	Knowledge DistillationMachine Translation	—Unverified
NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge	May 8, 2023	Knowledge Distillationvalid	—Unverified
New Perspective on Progressive GANs Distillation for One-class Novelty Detection	Sep 15, 2021	DecoderGenerative Adversarial Network	—Unverified
NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application	Feb 9, 2021	ArticlesKnowledge Distillation	—Unverified
NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation	Jul 27, 2022	Graph GenerationKnowledge Distillation	—Unverified
Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation	May 19, 2024	Knowledge Distillation	—Unverified
NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging	Mar 9, 2023	Data-free Knowledge DistillationFew-Shot Object Detection	—Unverified

Show:10 25 50

← PrevPage 68 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified