Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3451–3500 of 4240 papers

Title	Date	Tasks	Status
Mixed Sample Augmentation for Online Distillation	Jun 24, 2022	Data AugmentationKnowledge Distillation	—Unverified
Online Distilling from Checkpoints for Neural Machine Translation	Jun 1, 2019	Knowledge DistillationMachine Translation	—Unverified
Online Hyperparameter Meta-Learning with Hypergradient Distillation	Oct 6, 2021	Hyperparameter OptimizationKnowledge Distillation	—Unverified
Online Knowledge Distillation via Multi-branch Diversity Enhancement	Oct 2, 2020	Diversityimage-classification	—Unverified
Online Knowledge Distillation with Reward Guidance	May 25, 2025	Imitation LearningKnowledge Distillation	—Unverified
Online Policy Distillation with Decision-Attention	Jun 8, 2024	Deep Reinforcement LearningKnowledge Distillation	—Unverified
Online pre-training with long-form videos	Aug 28, 2024	Action RecognitionContrastive Learning	—Unverified
Online Sensor Hallucination via Knowledge Distillation for Multimodal Image Classification	Aug 28, 2019	ClassificationDecision Making	—Unverified
On Multilingual Encoder Language Model Compression for Low-Resource Languages	May 22, 2025	Knowledge DistillationLanguage Modeling	—Unverified
On Neural Network Equivalence Checking using SMT Solvers	Mar 22, 2022	Knowledge Distillation	—Unverified
On Reducing Activity with Distillation and Regularization for Energy Efficient Spiking Neural Networks	Jun 26, 2024	Knowledge Distillation	—Unverified
On Self-Distilling Graph Neural Network	Nov 4, 2020	Graph EmbeddingGraph Neural Network	—Unverified
On student-teacher deviations in distillation: does it pay to disobey?	Jan 30, 2023	Knowledge Distillation	—Unverified
On the benefits of knowledge distillation for adversarial robustness	Mar 14, 2022	Adversarial RobustnessKnowledge Distillation	—Unverified
On the Compression of Language Models for Code: An Empirical Study on CodeBERT	Dec 18, 2024	Code SearchCode Summarization	—Unverified
On the Demystification of Knowledge Distillation: A Residual Network Perspective	Jun 30, 2020	Knowledge DistillationModel Compression	—Unverified
On The Distribution of Penultimate Activations of Classification Networks	Jul 5, 2021	ClassificationConditional Image Generation	—Unverified
On the Efficacy of Knowledge Distillation	Oct 3, 2019	Knowledge Distillation	—Unverified
On the Efficiency of Subclass Knowledge Distillation in Classification Tasks	Sep 12, 2021	Binary ClassificationClassification	—Unverified
On the Impact of Knowledge Distillation for Model Interpretability	May 25, 2023	Knowledge Distillation	—Unverified
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance	Nov 1, 2024	Knowledge Distillation	—Unverified
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis	Oct 4, 2021	Knowledge DistillationSpeech Synthesis	—Unverified
On the Orthogonality of Knowledge Distillation with Other Techniques: From an Ensemble Perspective	Sep 9, 2020	Data AugmentationEfficient Neural Network	—Unverified
On the Query Strategies for Efficient Online Active Distillation	Sep 4, 2023	Active LearningContinual Learning	—Unverified
Analysis of Knowledge Transfer in Kernel Regime	Mar 30, 2020	Knowledge DistillationTransfer Learning	—Unverified
Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator	Jan 1, 2023	Knowledge DistillationRetrieval	—Unverified
Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias	Sep 21, 2020	Inductive BiasKnowledge Distillation	—Unverified
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation	Jul 18, 2024	Knowledge DistillationRepresentation Learning	—Unverified
Open-Vocabulary Object Detection using Pseudo Caption Labels	Mar 23, 2023	Image CaptioningKnowledge Distillation	—Unverified
Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization	Mar 14, 2024	Contrastive LearningKnowledge Distillation	—Unverified
Open World DETR: Transformer based Open World Object Detection	Dec 6, 2022	Knowledge DistillationObject	—Unverified
Leveraging Complementary Attention maps in vision transformers for OCT image analysis	Oct 21, 2023	Knowledge Distillation	—Unverified
OplixNet: Towards Area-Efficient Optical Split-Complex Networks with Real-to-Complex Data Assignment and Knowledge Distillation	Dec 3, 2023	Knowledge Distillation	—Unverified
Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer	Jul 10, 2020	Knowledge DistillationOptical Flow Estimation	—Unverified
Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices	Dec 12, 2024	Knowledge DistillationMamba	—Unverified
Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers	Feb 11, 2025	image-classificationImage Classification	—Unverified
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques	May 5, 2025	Knowledge DistillationMixture-of-Experts	—Unverified
Optimizing Multi-Gateway LoRaWAN via Cloud-Edge Collaboration and Knowledge Distillation	Apr 13, 2025	Decision MakingKnowledge Distillation	—Unverified
Optimizing speed/accuracy trade-off for person re-identification via knowledge distillation	Dec 7, 2018	Deep LearningGeneral Classification	—Unverified
Learning Deep and Compact Models for Gesture Recognition	Dec 29, 2017	Gesture RecognitionKnowledge Distillation	CodeCode Available
Improved Knowledge Distillation via Full Kernel Matrix Transfer	Sep 30, 2020	Knowledge DistillationModel Compression	CodeCode Available
Learning Efficient Detector with Semi-supervised Adaptive Distillation	Jan 2, 2019	image-classificationImage Classification	CodeCode Available
Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment	Jan 27, 2025	Knowledge Distillation	CodeCode Available
TernaryBERT: Distillation-aware Ultra-low Bit BERT	Sep 27, 2020	Knowledge DistillationQuantization	CodeCode Available
Training on the Test Model: Contamination in Ranking Distillation	Nov 4, 2024	Knowledge Distillation	CodeCode Available
Leaning Compact and Representative Features for Cross-Modality Person Re-Identification	Mar 26, 2021	Cross-Modality Person Re-identificationKnowledge Distillation	CodeCode Available
Beyond the Limitation of Monocular 3D Detector via Knowledge Distillation	Jan 1, 2023	Knowledge Distillation	CodeCode Available
ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images	Dec 7, 2021	DeepFake DetectionFace Swapping	CodeCode Available
Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation	Apr 3, 2025	Anomaly DetectionKnowledge Distillation	CodeCode Available
Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022	May 1, 2022	DecoderKnowledge Distillation	CodeCode Available

Show:10 25 50

← PrevPage 70 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified