Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3701–3750 of 4240 papers

Title	Date	Tasks	Status
Distilling the Undistillable: Learning from a Nasty Teacher	Oct 21, 2022	Knowledge Distillation	CodeCode Available
Tiny Updater: Towards Efficient Neural Network-Driven Software Updating	Jan 1, 2023	Efficient Neural Networkimage-classification	CodeCode Available
AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation	May 23, 2024	Knowledge Distillation	CodeCode Available
Induced Model Matching: Restricted Models Help Train Full-Featured Models	Jan 15, 2025	Knowledge DistillationLanguage Modeling	CodeCode Available
Induced Model Matching: How Restricted Models Can Help Larger Ones	Feb 19, 2024	Knowledge DistillationLanguage Modeling	CodeCode Available
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers	Apr 14, 2024	Knowledge Distillation	CodeCode Available
Spatial-Channel Token Distillation for Vision MLPs	Jul 23, 2022	Image ClassificationKnowledge Distillation	CodeCode Available
Masked Student Dataset of Expressions	Apr 7, 2023	Contrastive LearningFacial Expression Recognition	CodeCode Available
InDistill: Information flow-preserving knowledge distillation for model compression	May 20, 2022	Knowledge DistillationModel Compression	CodeCode Available
COMBHelper: A Neural Approach to Reduce Search Space for Graph Combinatorial Problems	Dec 14, 2023	Combinatorial OptimizationGraph Neural Network	CodeCode Available
Distilling Knowledge by Mimicking Features	Nov 3, 2020	Knowledge Distillationobject-detection	CodeCode Available
Reciprocal Supervised Learning Improves Neural Machine Translation	Dec 5, 2020	image-classificationImage Classification	CodeCode Available
Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism	Apr 30, 2024	Data AugmentationDiversity	CodeCode Available
Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition	Nov 9, 2021	Continual LearningKnowledge Distillation	CodeCode Available
MCC-KD: Multi-CoT Consistent Knowledge Distillation	Oct 23, 2023	DiversityKnowledge Distillation	CodeCode Available
To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation	Jun 6, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation	Dec 21, 2022	3D ReconstructionIncremental Learning	CodeCode Available
Incorporating Graph Information in Transformer-based AMR Parsing	Jun 23, 2023	Abstract Meaning RepresentationAMR Parsing	CodeCode Available
An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering	Mar 18, 2023	Graph Question AnsweringKnowledge Distillation	CodeCode Available
Improving Stance Detection with Multi-Dataset Learning and Knowledge Distillation	Nov 1, 2021	Knowledge DistillationStance Detection	CodeCode Available
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles	May 28, 2025	Knowledge DistillationSound Classification	CodeCode Available
Spatio-Temporal Branching for Motion Prediction using Motion Increments	Aug 2, 2023	Human motion predictionKnowledge Distillation	CodeCode Available
Improving Question Answering Performance Using Knowledge Distillation and Active Learning	Sep 26, 2021	Active LearningKnowledge Distillation	CodeCode Available
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection	Aug 30, 2024	Knowledge DistillationModel Compression	CodeCode Available
AI-KD: Towards Alignment Invariant Face Image Quality Assessment Using Knowledge Distillation	Apr 15, 2024	Face AlignmentFace Image Quality	CodeCode Available
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers	Dec 23, 2021	Dialect IdentificationGPU	CodeCode Available
Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain Conversation	Aug 28, 2021	Knowledge DistillationRetrieval	CodeCode Available
Unsupervised Domain Expansion for Visual Categorization	Apr 1, 2021	Domain AdaptationKnowledge Distillation	CodeCode Available
Improving Neural Topic Models with Wasserstein Knowledge Distillation	Mar 27, 2023	Knowledge DistillationTopic Models	CodeCode Available
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis	Dec 26, 2024	Knowledge DistillationTransfer Learning	CodeCode Available
Improving Neural Architecture Search Image Classifiers via Ensemble Learning	Mar 14, 2019	Ensemble LearningImage Classification	CodeCode Available
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning	Mar 11, 2024	DecoderIn-Context Learning	CodeCode Available
Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection	Dec 15, 2024	Knowledge DistillationNovelty Detection	CodeCode Available
Improving Knowledge Distillation via Transferring Learning Ability	Apr 24, 2023	Knowledge Distillation	CodeCode Available
Adaptive Prompt Learning with Distilled Connective Knowledge for Implicit Discourse Relation Recognition	Sep 14, 2023	Knowledge DistillationPrompt Learning	CodeCode Available
Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation	May 1, 2022	Knowledge DistillationTranslation	CodeCode Available
Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting	Jun 11, 2022	Computational EfficiencyCrowd Counting	CodeCode Available
Improving generalizability of distilled self-supervised speech processing models under distorted settings	Oct 14, 2022	Knowledge Distillation	CodeCode Available
Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models	Nov 7, 2023	AttributeDenoising	CodeCode Available
Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts	Jul 17, 2023	automatic-speech-translationImitation Learning	CodeCode Available
Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation	Dec 7, 2021	Auxiliary LearningKnowledge Distillation	CodeCode Available
Autoregressive Knowledge Distillation through Imitation Learning	Sep 15, 2020	Imitation LearningKnowledge Distillation	CodeCode Available
An Efficient Memory Module for Graph Few-Shot Class-Incremental Learning	Nov 11, 2024	class-incremental learningClass Incremental Learning	CodeCode Available
Improving Robustness by Enhancing Weak Subnets	Jan 30, 2022	Adversarial RobustnessData Augmentation	CodeCode Available
Improving Adversarial Robust Fairness via Anti-Bias Soft Label Distillation	Dec 9, 2023	Adversarial RobustnessFairness	CodeCode Available
Improved Knowledge Distillation via Teacher Assistant	Feb 9, 2019	Knowledge Distillation	CodeCode Available
Collective Relevance Labeling for Passage Retrieval	May 6, 2022	Information RetrievalKnowledge Distillation	CodeCode Available
Improved Knowledge Distillation for Crowd Counting on IoT Device	Aug 2, 2023	Crowd CountingKnowledge Distillation	CodeCode Available
IE-GAN: An Improved Evolutionary Generative Adversarial Network Using a New Fitness Function and a Generic Crossover Operator	Jul 25, 2021	Evolutionary AlgorithmsGenerative Adversarial Network	CodeCode Available
Distilling Stereo Networks for Performant and Efficient Leaner Networks	Mar 24, 2025	General KnowledgeKnowledge Distillation	CodeCode Available

Show:10 25 50

← PrevPage 75 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified