Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2251–2300 of 4240 papers

Title	Date	Tasks	Status
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning	Nov 20, 2024	Knowledge DistillationLarge Language Model	—Unverified
Explaining Knowledge Distillation by Quantifying the Knowledge	Mar 7, 2020	Knowledge Distillation	—Unverified
Explaining Knowledge Graph Embedding via Latent Rule Learning	Sep 29, 2021	Graph EmbeddingKnowledge Distillation	—Unverified
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation	Dec 6, 2019	Data AugmentationKnowledge Distillation	—Unverified
Explicit and Implicit Knowledge Distillation via Unlabeled Data	Feb 17, 2023	Data-free Knowledge DistillationKnowledge Distillation	—Unverified
Explicit Connection Distillation	Jan 1, 2021	image-classificationImage Classification	—Unverified
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified
Exploiting Knowledge Distillation for Few-Shot Image Generation	Sep 29, 2021	DiversityImage Generation	—Unverified
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR	Mar 24, 2023	Image RetrievalKnowledge Distillation	—Unverified
Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models	Sep 19, 2024	Knowledge Distillation	—Unverified
Exploring compressibility of transformer based text-to-music (TTM) models	Jun 24, 2024	DecoderFAD	—Unverified
Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch	May 21, 2024	Knowledge Distillation	—Unverified
Exploring Dual Model Knowledge Distillation for Anomaly Detection	Jun 27, 2023	Anomaly Detectionfeature selection	—Unverified
Exploring Extreme Quantization in Spiking Language Models	May 4, 2024	Knowledge DistillationLanguage Modeling	—Unverified
Exploring Knowledge Distillation of a Deep Neural Network for Multi-Script identification	Feb 20, 2021	Knowledge DistillationTransfer Learning	—Unverified
Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation	Dec 31, 2020	Knowledge DistillationMachine Translation	—Unverified
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection	Aug 30, 2023	Knowledge DistillationLanguage Modeling	—Unverified
Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection	Jan 11, 2024	Human-Object Interaction DetectionKnowledge Distillation	—Unverified
A Note on Knowledge Distillation Loss Function for Object Classification	Sep 14, 2021	Knowledge DistillationModel Compression	—Unverified
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT	Jul 1, 2020	Document ClassificationGeneral Classification	—Unverified
Extending Label Smoothing Regularization with Self-Knowledge Distillation	Sep 11, 2020	Knowledge DistillationSelf-Knowledge Distillation	—Unverified
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation	Jan 22, 2025	Knowledge Distillation	—Unverified
Extracting knowledge from features with multilevel abstraction	Dec 4, 2021	Data AugmentationKnowledge Distillation	—Unverified
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation	Apr 24, 2021	Knowledge Distillation	—Unverified
Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution	Jun 30, 2020	Image ClassificationKnowledge Distillation	—Unverified
Extreme Compression for Pre-trained Transformers Made Simple and Efficient	Jun 4, 2022	Knowledge DistillationQuantization	—Unverified
Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices	Jun 29, 2022	Dimensionality ReductionKnowledge Distillation	—Unverified
Extremely Small BERT Models from Mixed-Vocabulary Training	Sep 25, 2019	Knowledge DistillationLanguage Modelling	—Unverified
Face to Cartoon Incremental Super-Resolution using Knowledge Distillation	Jan 27, 2024	HallucinationIncremental Learning	—Unverified
Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models	Nov 20, 2018	Knowledge DistillationPerson Re-Identification	—Unverified
Factorized RVQ-GAN For Disentangled Speech Tokenization	Jun 18, 2025	DisentanglementKnowledge Distillation	—Unverified
Factual Dialogue Summarization via Learning from Large Language Models	Jun 20, 2024	Contrastive LearningData Augmentation	—Unverified
Selective Cross-Task Distillation	Apr 25, 2022	Knowledge Distillation	—Unverified
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices	Jun 20, 2024	Knowledge DistillationModel Compression	—Unverified
Fair Feature Distillation for Visual Recognition	May 27, 2021	FairnessKnowledge Distillation	—Unverified
Fair Feature Importance Scores for Interpreting Tree-Based Methods and Surrogates	Oct 6, 2023	FairnessFeature Importance	—Unverified
Fairly Predicting Graft Failure in Liver Transplant for Organ Assigning	Feb 18, 2023	FairnessKnowledge Distillation	—Unverified
Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments	May 25, 2023	Continual LearningContinual Semantic Segmentation	—Unverified
Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning	Jun 21, 2024	Knowledge Distillation	—Unverified
Faithful Knowledge Distillation	Jun 7, 2023	Adversarial RobustnessKnowledge Distillation	—Unverified
Fake It Till Make It: Federated Learning with Consensus-Oriented Generation	Dec 10, 2023	Federated LearningKnowledge Distillation	—Unverified
Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices	Aug 24, 2023	Knowledge DistillationTime Series	—Unverified
False Negative Distillation and Contrastive Learning for Personalized Outfit Recommendation	Oct 13, 2021	Contrastive LearningData Augmentation	—Unverified
FAN-Trans: Online Knowledge Distillation for Facial Action Unit Detection	Nov 11, 2022	Action Unit DetectionFace Alignment	—Unverified
Fast and Efficient Once-For-All Networks for Diverse Hardware Deployment	Sep 29, 2021	AllGPU	—Unverified
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation	Sep 5, 2023	Image CompressionKnowledge Distillation	—Unverified
Fast DistilBERT on CPUs	Oct 27, 2022	Knowledge DistillationModel Compression	—Unverified
Fast End-to-end Coreference Resolution for Korean	Nov 1, 2020	coreference-resolutionCoreference Resolution	—Unverified
FasterAI: A Lightweight Library for Creating Sparse Neural Networks	Jul 3, 2022	Knowledge Distillation	—Unverified
Faster Inference of Integer SWIN Transformer by Removing the GELU Activation	Feb 2, 2024	GPUimage-classification	—Unverified

Show:10 25 50

← PrevPage 46 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified