Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1501–1550 of 4240 papers

Title	Date	Tasks	Status	Score
HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation	Mar 18, 2024	Knowledge DistillationNER	CodeCode Available	5
On the Transferability of Visual Features in Generalized Zero-Shot Learning	Nov 22, 2022	Generalized Zero-Shot LearningKnowledge Distillation	CodeCode Available	5
Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Glucose Forecasting	Nov 16, 2024	Knowledge Distillation	CodeCode Available	5
Hybrid Data-Free Knowledge Distillation	Dec 18, 2024	Data-free Knowledge DistillationGenerative Adversarial Network	CodeCode Available	5
Applying Knowledge Distillation to Improve Weed Mapping With Drones	Oct 8, 2023	Knowledge DistillationManagement	CodeCode Available	5
Chemical transformer compression for accelerating both training and inference of molecular modeling	May 16, 2022	Knowledge DistillationModel Compression	CodeCode Available	5
Distribution Aligned Semantics Adaption for Lifelong Person Re-Identification	May 30, 2024	Knowledge DistillationPerson Re-Identification	CodeCode Available	5
Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation	Sep 18, 2023	ChatbotKnowledge Distillation	CodeCode Available	5
Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models	Mar 20, 2024	ChatbotKnowledge Distillation	CodeCode Available	5
Distributed Soft Actor-Critic with Multivariate Reward Representation and Knowledge Distillation	Nov 29, 2019	Knowledge Distillationreinforcement-learning	CodeCode Available	5
TinyBERT: Distilling BERT for Natural Language Understanding	Sep 23, 2019	Knowledge DistillationLanguage Modelling	CodeCode Available	5
HTR-JAND: Handwritten Text Recognition with Joint Attention Network and Knowledge Distillation	Dec 24, 2024	Computational EfficiencyHandwritten Text Recognition	CodeCode Available	5
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation	Dec 10, 2021	Extractive SummarizationKnowledge Distillation	CodeCode Available	5
Image Recognition with Online Lightweight Vision Transformer: A Survey	May 6, 2025	Knowledge DistillationSurvey	CodeCode Available	5
Invariant debiasing learning for recommendation via biased imputation	Dec 28, 2024	ImputationKnowledge Distillation	CodeCode Available	5
How Knowledge Distillation Mitigates the Synthetic Gap in Fair Face Recognition	Aug 30, 2024	Face RecognitionFairness	CodeCode Available	5
HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution	Aug 30, 2024	Image Super-ResolutionKnowledge Distillation	CodeCode Available	5
Holistic White-light Polyp Classification via Alignment-free Dense Distillation of Auxiliary Optical Chromoendoscopy	May 25, 2025	DiagnosticKnowledge Distillation	CodeCode Available	5
Highlight Every Step: Knowledge Distillation via Collaborative Teaching	Jul 23, 2019	Knowledge Distillation	CodeCode Available	5
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification	Jul 10, 2024	Computational Efficiencyimage-classification	CodeCode Available	5
Distill n' Explain: explaining graph neural networks using simple surrogates	Mar 17, 2023	Knowledge Distillation	CodeCode Available	5
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation	Jun 12, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	5
Distilling Virtual Examples for Long-tailed Recognition	Mar 28, 2021	Knowledge DistillationLong-tail Learning	CodeCode Available	5
FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation	Aug 30, 2022	Knowledge DistillationSegmentation	CodeCode Available	5
Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data	Jul 7, 2023	Knowledge DistillationModel Compression	CodeCode Available	5
GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples	May 13, 2023	BinarizationKnowledge Distillation	CodeCode Available	5
A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity Recognition	Apr 2, 2022	Contrastive LearningCross-Lingual NER	CodeCode Available	5
Distilling the Undistillable: Learning from a Nasty Teacher	Oct 21, 2022	Knowledge Distillation	CodeCode Available	5
Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding	Dec 27, 2023	3D Classification3D Shape Recognition	CodeCode Available	5
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning	Oct 20, 2024	Image RetrievalImage-text Retrieval	CodeCode Available	5
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers	Dec 23, 2021	Dialect IdentificationGPU	CodeCode Available	5
Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain Conversation	Aug 28, 2021	Knowledge DistillationRetrieval	CodeCode Available	5
CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing	May 24, 2022	Data-free Knowledge DistillationKnowledge Distillation	CodeCode Available	5
An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition	Nov 16, 2021	Cross-Lingual NERKnowledge Distillation	CodeCode Available	5
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing	May 31, 2021	Knowledge DistillationUnsupervised Pre-training	CodeCode Available	5
Graph Knowledge Distillation to Mixture of Experts	Jun 17, 2024	Knowledge DistillationMixture-of-Experts	CodeCode Available	5
Handling Data Heterogeneity in Federated Learning via Knowledge Distillation and Fusion	Jul 23, 2022	Data-free Knowledge DistillationFairness	CodeCode Available	5
Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy	Aug 29, 2022	Data-free Knowledge DistillationKnowledge Distillation	CodeCode Available	5
Distilling Stereo Networks for Performant and Efficient Leaner Networks	Mar 24, 2025	General KnowledgeKnowledge Distillation	CodeCode Available	5
Graph-based Knowledge Distillation by Multi-head Attention Network	Jul 4, 2019	Inductive BiasKnowledge Distillation	CodeCode Available	5
Cooperative Classification and Rationalization for Graph Generalization	Mar 10, 2024	ClassificationGraph Classification	CodeCode Available	5
Gradient Knowledge Distillation for Pre-trained Language Models	Nov 2, 2022	Knowledge Distillation	CodeCode Available	5
Graph Entropy Minimization for Semi-supervised Node Classification	May 31, 2023	ClassificationKnowledge Distillation	CodeCode Available	5
Spending Your Winning Lottery Better After Drawing It	Jan 8, 2021	Knowledge Distillation	CodeCode Available	5
GOTHAM: Graph Class Incremental Learning Framework under Weak Supervision	Apr 7, 2025	Attributeclass-incremental learning	CodeCode Available	5
Goldfish: An Efficient Federated Unlearning Framework	Apr 4, 2024	Knowledge DistillationMachine Unlearning	CodeCode Available	5
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues	Mar 11, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	5
GNN's Uncertainty Quantification using Self-Distillation	Jun 24, 2025	Knowledge DistillationUncertainty Quantification	CodeCode Available	5
Distilling Object Detectors With Global Knowledge	Oct 17, 2022	Knowledge DistillationObject	CodeCode Available	5
Goal-Conditioned Q-Learning as Knowledge Distillation	Aug 28, 2022	Knowledge DistillationQ-Learning	CodeCode Available	5

Show:10 25 50

← PrevPage 31 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified