Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1851–1900 of 4240 papers

Title	Date	Tasks	Status
High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	May 3, 2025	DiagnosticKnowledge Distillation	—Unverified
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws	Oct 24, 2024	Knowledge Distillationregression	—Unverified
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation	Feb 16, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Hierarchical Selective Classification	May 19, 2024	ClassificationKnowledge Distillation	—Unverified
Hierarchical Knowledge Distillation on Text Graph for Data-limited Attribute Inference	Jan 10, 2024	AttributeFew-Shot Learning	—Unverified
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?	May 27, 2021	DiversityKnowledge Distillation	—Unverified
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling	Nov 22, 2021	Knowledge DistillationScene Segmentation	—Unverified
Deep Collective Knowledge Distillation	Apr 18, 2023	Knowledge DistillationModel Compression	—Unverified
A metric learning approach for endoscopic kidney stone identification	Jul 13, 2023	Few-Shot LearningKnowledge Distillation	—Unverified
How to Backdoor the Knowledge Distillation	Apr 30, 2025	Knowledge Distillation	—Unverified
HFedCKD: Toward Robust Heterogeneous Federated Learning via Data-free Knowledge Distillation and Two-way Contrast	Mar 9, 2025	Data-free Knowledge DistillationFederated Learning	—Unverified
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling	Sep 18, 2023	image-classificationImage Classification	—Unverified
Heterogeneous Federated Learning Using Knowledge Codistillation	Oct 4, 2023	Federated Learningimage-classification	—Unverified
Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning	Feb 4, 2023	Federated LearningGraph Embedding	—Unverified
Heterogeneous Continual Learning	Jun 14, 2023	Continual LearningKnowledge Distillation	—Unverified
Heterogeneous-Branch Collaborative Learning for Dialogue Generation	Mar 21, 2023	AttributeDialogue Generation	—Unverified
A method for estimating forest carbon storage distribution density via artificial intelligence generated content model	Feb 2, 2025	Knowledge Distillation	—Unverified
Adaptive Multiplane Image Generation from a Single Internet Picture	Nov 26, 2020	Depth EstimationImage Generation	—Unverified
A Closer Look at Rehearsal-Free Continual Learning	Mar 31, 2022	Continual LearningKnowledge Distillation	—Unverified
Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning	Jan 28, 2025	Federated LearningKnowledge Distillation	—Unverified
HeteFedRec: Federated Recommender Systems with Model Heterogeneity	Jul 24, 2023	Knowledge Distillationmodel	—Unverified
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment	Nov 3, 2024	Knowledge DistillationPhilosophy	—Unverified
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression	Nov 30, 2022	Efficient ExplorationKnowledge Distillation	—Unverified
Human in the Latent Loop (HILL): Interactively Guiding Model Training Through Human Intuition	May 9, 2025	Knowledge Distillation	—Unverified
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers	Nov 26, 2019	Knowledge DistillationLipreading	—Unverified
Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation	Apr 20, 2023	Knowledge DistillationMachine Translation	—Unverified
HW-TSC’s Participation in the WMT 2020 News Translation Shared Task	Nov 1, 2020	Knowledge DistillationTranslation	—Unverified
HW-TSC’s Participation in the WMT 2021 Large-Scale Multilingual Translation Task	Nov 1, 2021	Knowledge DistillationTranslation	—Unverified
Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks	Apr 29, 2025	Knowledge DistillationTransfer Learning	—Unverified
Decoupled Transformer for Scalable Inference in Open-domain Question Answering	Sep 1, 2021	Knowledge DistillationMachine Reading Comprehension	—Unverified
Headache to Overstock? Promoting Long-tail Items through Debiased Product Bundling	Nov 28, 2024	Knowledge DistillationNavigate	—Unverified
Decoupled Transformer for Scalable Inference in Open-domain Question Answering	Aug 5, 2021	Knowledge DistillationMachine Reading Comprehension	—Unverified
Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks	Apr 19, 2023	Knowledge Distillation	—Unverified
AMD: Automatic Multi-step Distillation of Large-scale Vision Models	Jul 5, 2024	image-classificationImage Classification	—Unverified
hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation	Jun 5, 2025	Code GenerationCode Translation	—Unverified
Spectral Maps for Learning on Subgraphs	May 30, 2022	Graph LearningKnowledge Distillation	—Unverified
Harnessing Increased Client Participation with Cohort-Parallel Federated Learning	May 24, 2024	Federated Learningimage-classification	—Unverified
Harmonizing knowledge Transfer in Neural Network with Unified Distillation	Sep 27, 2024	Knowledge DistillationTransfer Learning	—Unverified
HARD: Hard Augmentations for Robust Distillation	May 24, 2023	Data AugmentationDomain Generalization	—Unverified
I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation	Dec 19, 2022	Imitation LearningKnowledge Distillation	—Unverified
I^2KD-SLU: An Intra-Inter Knowledge Distillation Framework for Zero-Shot Cross-Lingual Spoken Language Understanding	Oct 4, 2023	Intent DetectionKnowledge Distillation	—Unverified
Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language Model	Oct 22, 2022	Knowledge DistillationLanguage Modeling	—Unverified
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions	Nov 30, 2023	Knowledge DistillationRAG	—Unverified
ICD-Face: Intra-class Compactness Distillation for Face Recognition	Jan 1, 2023	Face RecognitionKnowledge Distillation	—Unverified
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions	Jan 1, 2025	Knowledge DistillationMotion Estimation	—Unverified
AMD: Adaptive Masked Distillation for Object Detection	Jan 31, 2023	Knowledge DistillationModel Compression	—Unverified
HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training	Jul 15, 2025	Cross-Lingual TransferKnowledge Distillation	—Unverified
If At First You Don't Succeed: Test Time Re-ranking for Zero-shot, Cross-domain Retrieval	Mar 30, 2023	Image RetrievalKnowledge Distillation	—Unverified
Hands-on Guidance for Distilling Object Detectors	Mar 26, 2021	Knowledge DistillationObject	—Unverified
Decoupled Alignment for Robust Plug-and-Play Adaptation	Jun 3, 2024	Knowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 38 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified