Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2800 of 4240 papers

Title	Date	Tasks	Status
A Survey on Model Compression for Large Language Models	Aug 15, 2023	BenchmarkingKnowledge Distillation	—Unverified
A Survey on Recent Teacher-student Learning Studies	Apr 10, 2023	Knowledge DistillationSurvey	—Unverified
A Survey on Symbolic Knowledge Distillation of Large Language Models	Jul 12, 2024	Knowledge DistillationSurvey	—Unverified
A Survey on Transformer Compression	Feb 5, 2024	Knowledge DistillationMamba	—Unverified
Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence	Mar 9, 2025	Decision MakingKnowledge Distillation	—Unverified
ADPS: Asymmetric Distillation Post-Segmentation for Image Anomaly Detection	Oct 19, 2022	Anomaly DetectionAnomaly Localization	—Unverified
Asymmetric Image Retrieval with Cross Model Compatible Ensembles	Mar 30, 2023	DiversityFace Recognition	—Unverified
Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again	Oct 10, 2022	Knowledge Distillation	—Unverified
Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks	Jul 1, 2022	Knowledge DistillationMulti-Task Learning	—Unverified
Edge Bias in Federated Learning and its Solution by Buffered Knowledge Distillation	Oct 20, 2020	Federated LearningKnowledge Distillation	—Unverified
A Technical Study into Small Reasoning Language Models	Jun 16, 2025	Code GenerationComputational Efficiency	—Unverified
A Theoretical Analysis of Soft-Label vs Hard-Label Training in Neural Networks	Dec 12, 2024	Binary ClassificationKnowledge Distillation	—Unverified
A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition	Feb 24, 2025	image-classificationImage Classification	—Unverified
Attention-Guided Answer Distillation for Machine Reading Comprehension	Aug 23, 2018	Knowledge DistillationMachine Reading Comprehension	—Unverified
Attention-guided Feature Distillation for Semantic Segmentation	Mar 8, 2024	Knowledge DistillationSegmentation	—Unverified
Attention is all you need for boosting graph convolutional neural network	Mar 10, 2024	AllKnowledge Distillation	—Unverified
AttentionLite: Towards Efficient Self-Attention Models for Vision	Dec 21, 2020	Knowledge Distillation	—Unverified
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models	Nov 9, 2019	Knowledge DistillationMulti-Task Learning	—Unverified
Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model	Jul 4, 2021	Knowledge DistillationMachine Reading Comprehension	—Unverified
Audio Representation Learning by Distilling Video as Privileged Information	Feb 6, 2023	Emotion RecognitionKnowledge Distillation	—Unverified
Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation	Oct 21, 2022	Data AugmentationDiversity	—Unverified
Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression	Oct 21, 2021	Knowledge DistillationModel Compression	—Unverified
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation	Apr 2, 2023	Face GenerationKnowledge Distillation	—Unverified
A Unified Framework for Continual Learning and Unlearning	Aug 21, 2024	Continual LearningKnowledge Distillation	—Unverified
A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems	Feb 5, 2025	Knowledge Distillation	—Unverified
A Unified Knowledge Distillation Framework for Deep Directed Graphical Models	Sep 29, 2021	Continual LearningFederated Learning	—Unverified
AutoADR: Automatic Model Design for Ad Relevance	Oct 14, 2020	AutoMLKnowledge Distillation	—Unverified
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models	Jan 29, 2022	Inductive BiasKnowledge Distillation	—Unverified
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models	Jan 21, 2022	Bayesian OptimizationKnowledge Distillation	—Unverified
AUTOKD: Automatic Knowledge Distillation Into A Student Architecture Family	Nov 5, 2021	Bayesian OptimizationKnowledge Distillation	—Unverified
Automated Channel Pruning with Learned Importance	Sep 29, 2021	DenoisingGPU	—Unverified
Automated Graph Self-supervised Learning via Multi-teacher Knowledge Distillation	Oct 5, 2022	Graph Representation LearningKnowledge Distillation	—Unverified
Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep Convolutional Neural Networks	May 7, 2022	Knowledge DistillationModel Compression	—Unverified
Automatic Mixed-Precision Quantization Search of BERT	Dec 30, 2021	Knowledge DistillationModel Compression	—Unverified
AUTOSUMM: Automatic Model Creation for Text Summarization	Nov 1, 2021	Abstractive Text SummarizationDeep Learning	—Unverified
A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models	Aug 2, 2023	Knowledge DistillationTransfer Learning	—Unverified
Aware of the History: Trajectory Forecasting with the Local Behavior Data	Jul 20, 2022	Knowledge DistillationPrediction	—Unverified
AWF: Adaptive Weight Fusion for Enhanced Class Incremental Semantic Segmentation	Sep 13, 2024	Class-Incremental Semantic SegmentationKnowledge Distillation	—Unverified
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models	Dec 20, 2024	Knowledge DistillationLanguage Modeling	—Unverified
Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation	Jul 13, 2024	Class-Incremental Semantic SegmentationExemplar-Free	—Unverified
Knowledge Distillation for Human Action Anticipation	Apr 9, 2019	Action AnticipationAction Recognition	—Unverified
Baidu Neural Machine Translation Systems for WMT19	Aug 1, 2019	Data AugmentationDomain Adaptation	—Unverified
Balance Divergence for Knowledge Distillation	Jan 14, 2025	image-classificationImage Classification	—Unverified
Balanced softmax cross-entropy for incremental learning with and without memory	Mar 23, 2021	class-incremental learningClass Incremental Learning	—Unverified
Balancing Cost and Benefit with Tied-Multi Transformers	Feb 20, 2020	DecoderKnowledge Distillation	—Unverified
A predictive machine learning force field framework for liquid electrolyte development	Apr 10, 2024	Knowledge Distillation	—Unverified
BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation Techniques	Nov 22, 2024	Hate Speech DetectionKnowledge Distillation	—Unverified
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization	Jun 30, 2024	Continual LearningGeneral Knowledge	—Unverified
Bayesian-Optimized One-Step Diffusion Model with Knowledge Distillation for Real-Time 3D Human Motion Prediction	Sep 19, 2024	Bayesian OptimizationHuman motion prediction	—Unverified
BD-KD: Balancing the Divergences for Online Knowledge Distillation	Dec 25, 2022	Knowledge DistillationModel Compression	—Unverified

Show:10 25 50

← PrevPage 56 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified