Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 4240 papers

Title	Date	Tasks	Status	Hype
Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents	Oct 13, 2023	InformativenessKnowledge Distillation	CodeCode Available	1
Focal and Global Knowledge Distillation for Detectors	Nov 23, 2021	image-classificationImage Classification	CodeCode Available	1
Directed Acyclic Transformer for Non-Autoregressive Machine Translation	May 16, 2022	Knowledge DistillationMachine Translation	CodeCode Available	1
FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image Recognition	May 27, 2023	image-classificationImage Classification	CodeCode Available	1
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs	May 21, 2025	Knowledge DistillationKnowledge Graphs	CodeCode Available	1
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?	Feb 17, 2025	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Prototype-based Incremental Few-Shot Semantic Segmentation	Nov 30, 2020	Few-Shot Semantic SegmentationIncremental Learning	CodeCode Available	1
General Cyclical Training of Neural Networks	Feb 17, 2022	Data AugmentationKnowledge Distillation	CodeCode Available	1
Dense Interspecies Face Embedding	Nov 28, 2022	Image ManipulationInterspecies Facial Keypoint Transfer	CodeCode Available	1
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers	Sep 3, 2023	Action DetectionAction Spotting	CodeCode Available	1
Generative Bias for Robust Visual Question Answering	Aug 1, 2022	Knowledge DistillationQuestion Answering	CodeCode Available	1
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation	Jan 25, 2024	ClusteringFederated Learning	CodeCode Available	1
Geometer: Graph Few-Shot Class-Incremental Learning via Prototype Representation	May 27, 2022	class-incremental learningClass Incremental Learning	CodeCode Available	1
Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks	Oct 24, 2022	Knowledge DistillationTransfer Learning	CodeCode Available	1
GlobalFlowNet: Video Stabilization using Deep Distilled Global Motion Estimates	Oct 25, 2022	Knowledge DistillationOptical Flow Estimation	CodeCode Available	1
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation	Mar 16, 2023	Knowledge DistillationOpen Vocabulary Semantic Segmentation	CodeCode Available	1
A Discrepancy Aware Framework for Robust Anomaly Detection	Oct 11, 2023	Anomaly DetectionDecoder	CodeCode Available	1
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation	Feb 5, 2024	Knowledge Distillation	CodeCode Available	1
Gradient-based Intra-attention Pruning on Pre-trained Language Models	Dec 15, 2022	Knowledge Distillation	CodeCode Available	1
Graph-based Knowledge Distillation: A survey and experimental evaluation	Feb 27, 2023	Knowledge DistillationSelf-Knowledge Distillation	CodeCode Available	1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet	Feb 23, 2023	BenchmarkingKnowledge Distillation	CodeCode Available	1
Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation	May 19, 2021	Image ClassificationKnowledge Distillation	CodeCode Available	1
A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering	Apr 26, 2023	DecoderKnowledge Distillation	CodeCode Available	1
Contrastive Model Inversion for Data-Free Knowledge Distillation	May 18, 2021	Contrastive LearningData-free Knowledge Distillation	CodeCode Available	1
Complementary Relation Contrastive Distillation	Mar 29, 2021	Knowledge DistillationRelation	CodeCode Available	1
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge	Jul 28, 2020	Federated LearningKnowledge Distillation	CodeCode Available	1
Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation	Sep 16, 2022	Domain AdaptationImage-to-Image Translation	CodeCode Available	1
HAD-Net: A Hierarchical Adversarial Knowledge Distillation Network for Improved Enhanced Tumour Segmentation Without Post-Contrast Images	Mar 30, 2021	Knowledge DistillationSegmentation	CodeCode Available	1
Contrastive Distillation on Intermediate Representations for Language Model Compression	Sep 29, 2020	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Heterogeneous Knowledge Distillation using Information Flow Modeling	May 2, 2020	Knowledge Distillation	CodeCode Available	1
Hierarchical Self-supervised Augmented Knowledge Distillation	Jul 29, 2021	Knowledge DistillationRepresentation Learning	CodeCode Available	1
Comprehensive Knowledge Distillation with Causal Intervention	Dec 1, 2021	Causal InferenceKnowledge Distillation	CodeCode Available	1
Densely Guided Knowledge Distillation using Multiple Teacher Assistants	Sep 18, 2020	Knowledge DistillationModel Compression	CodeCode Available	1
Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs	May 25, 2021	AttributeKnowledge Distillation	CodeCode Available	1
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives	May 24, 2023	Knowledge DistillationQNLI	CodeCode Available	1
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding	Sep 13, 2021	Adversarial RobustnessAll	CodeCode Available	1
Deep Structured Instance Graph for Distilling Object Detectors	Sep 27, 2021	Instance SegmentationKnowledge Distillation	CodeCode Available	1
AgeFlow: Conditional Age Progression and Regression with Normalizing Flows	May 15, 2021	AttributeKnowledge Distillation	CodeCode Available	1
I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval	Jun 4, 2023	Knowledge DistillationPassage Retrieval	CodeCode Available	1
IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors	Oct 7, 2022	Knowledge Distillationobject-detection	CodeCode Available	1
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via α-β-Divergence	May 7, 2025	Knowledge Distillation	CodeCode Available	1
Improve Cross-Architecture Generalization on Dataset Distillation	Feb 20, 2024	Dataset DistillationKnowledge Distillation	CodeCode Available	1
Improved Techniques for Training Adaptive Deep Networks	Aug 17, 2019	Computational EfficiencyKnowledge Distillation	CodeCode Available	1
Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors	Jan 1, 2021	image-classificationImage Classification	CodeCode Available	1
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone	May 19, 2025	Knowledge DistillationTransfer Learning	CodeCode Available	1
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	May 26, 2023	Domain AdaptationKnowledge Distillation	CodeCode Available	1
Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation	Dec 7, 2021	Knowledge DistillationMulti-Task Learning	CodeCode Available	1
Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup	Dec 17, 2020	InformativenessKnowledge Distillation	CodeCode Available	1
Defocus Blur Detection via Depth Distillation	Jul 16, 2020	DecoderDefocus Blur Detection	CodeCode Available	1
Camera clustering for scalable stream-based active distillation	Apr 16, 2024	ClusteringKnowledge Distillation	CodeCode Available	1

Show:10 25 50

← PrevPage 14 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified