Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2601–2650 of 4240 papers

Title	Date	Tasks	Status
Scalable Syntax-Aware Language Models Using Knowledge Distillation	Jun 14, 2019	Knowledge DistillationLanguage Modeling	—Unverified
Scale-Equivalent Distillation for Semi-Supervised Object Detection	Mar 23, 2022	Knowledge DistillationObject	—Unverified
ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector	Jan 1, 2023	Knowledge Distillationobject-detection	—Unverified
ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression	Dec 13, 2024	Knowledge DistillationPrivacy Preserving	—Unverified
Scaling Fair Learning to Hundreds of Intersectional Groups	Sep 29, 2021	AttributeFairness	—Unverified
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis	Jan 26, 2025	ArticlesHallucination	—Unverified
Scaling Laws for Data-Efficient Visual Transfer Learning	Apr 17, 2025	Knowledge DistillationTransfer Learning	—Unverified
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective	Dec 18, 2024	Knowledge Distillation	—Unverified
SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields	Sep 6, 2024	Continual LearningKnowledge Distillation	—Unverified
Scavenging Hyena: Distilling Transformers into Long Convolution Models	Jan 31, 2024	Knowledge Distillation	—Unverified
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection	Jan 1, 2024	Knowledge Distillationobject-detection	—Unverified
Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search	Jul 15, 2021	Knowledge DistillationNeural Architecture Search	—Unverified
Scene-aware Human Pose Generation using Transformer	Aug 4, 2023	Knowledge DistillationScene Understanding	—Unverified
Scene Graph Aided Radiology Report Generation	Mar 8, 2024	DecoderKnowledge Distillation	—Unverified
Scheduled Knowledge Acquisition on Lightweight Vector Symbolic Architectures for Brain-Computer Interfaces	Mar 18, 2024	Feature EngineeringKnowledge Distillation	—Unverified
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA	Aug 9, 2023	ARCKnowledge Distillation	—Unverified
SCLIFD:Supervised Contrastive Knowledge Distillation for Incremental Fault Diagnosis under Limited Fault Data	Feb 12, 2023	class-incremental learningClass Incremental Learning	—Unverified
SDBERT: SparseDistilBERT, a faster and smaller BERT model	Jul 28, 2022	Knowledge Distillation	—Unverified
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection	Feb 27, 2024	class-incremental learningClass Incremental Learning	—Unverified
SDQ: Stochastic Differentiable Quantization with Mixed Precision	Jun 9, 2022	Knowledge DistillationNeural Architecture Search	—Unverified
Search for Better Students to Learn Distilled Knowledge	Jan 30, 2020	Knowledge DistillationModel Compression	—Unverified
Searching for COMETINHO: The Little Metric That Could	Jun 1, 2022	Computational EfficiencyKnowledge Distillation	—Unverified
Search to Distill: Pearls are Everywhere but not the Eyes	Nov 20, 2019	Ensemble LearningFace Recognition	—Unverified
SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots	Jun 20, 2024	In-Context LearningKnowledge Distillation	—Unverified
Secost: Sequential co-supervision for large scale weakly labeled audio event detection	Oct 25, 2019	Event DetectionKnowledge Distillation	—Unverified
Secure Your Ride: Real-time Matching Success Rate Prediction for Passenger-Driver Pairs	Sep 14, 2021	Decision MakingKnowledge Distillation	—Unverified
SEDD-PCC: A Single Encoder-Dual Decoder Framework For End-To-End Learned Point Cloud Compression	May 22, 2025	AttributeDecoder	—Unverified
Segment Any RGB-Thermal Model with Language-aided Distillation	May 4, 2025	Instance SegmentationKnowledge Distillation	—Unverified
SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models	Feb 27, 2025	GPUKnowledge Distillation	—Unverified
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models	Mar 14, 2024	Continual LearningKnowledge Distillation	—Unverified
Selecting Related Knowledge via Efficient Channel Attention for Online Continual Learning	Sep 9, 2022	Continual LearningKnowledge Distillation	—Unverified
SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling	Sep 25, 2024	Cancer ClassificationKnowledge Distillation	—Unverified
Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation	Mar 31, 2023	Knowledge DistillationMachine Translation	—Unverified
Self-Cooperation Knowledge Distillation for Novel Class Discovery	Jul 2, 2024	Knowledge DistillationNovel Class Discovery	—Unverified
Self-Distillation Amplifies Regularization in Hilbert Space	Feb 13, 2020	Knowledge DistillationL2 Regularization	—Unverified
Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks	Jun 12, 2024	Knowledge Distillation	—Unverified
Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation	Dec 22, 2021	Knowledge DistillationMachine Translation	—Unverified
Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification	Apr 27, 2021	ClassificationGeneral Classification	—Unverified
Self-Distilled Pruning Of Neural Networks	Sep 29, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Self-Distilled Pruning of Neural Networks	Nov 16, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Self-Evolution Knowledge Distillation for LLM-based Machine Translation	Dec 19, 2024	Knowledge DistillationMachine Translation	—Unverified
A New Training Framework for Deep Neural Network	Mar 12, 2021	Knowledge Distillation	—Unverified
SELF-KNOWLEDGE DISTILLATION ADVERSARIAL ATTACK	Sep 25, 2019	Adversarial AttackKnowledge Distillation	—Unverified
Self-Knowledge Distillation based Self-Supervised Learning for Covid-19 Detection from Chest X-Ray Images	Jun 7, 2022	Knowledge DistillationSelf-Knowledge Distillation	—Unverified
Self-Knowledge Distillation for Learning Ambiguity	Jun 14, 2024	Knowledge DistillationNatural Language Understanding	—Unverified
Self-Knowledge Distillation for Surgical Phase Recognition	Jun 15, 2023	DecoderKnowledge Distillation	—Unverified
Self-Knowledge Distillation in Natural Language Processing	Aug 2, 2019	Deep LearningKnowledge Distillation	—Unverified
Self-Knowledge Distillation via Dropout	Aug 11, 2022	Adversarial Robustnessimage-classification	—Unverified
Self-Referenced Deep Learning	Nov 19, 2018	Deep LearningKnowledge Distillation	—Unverified
Self Regulated Learning Mechanism for Data Efficient Knowledge Distillation	Feb 14, 2021	Knowledge DistillationTransfer Learning	—Unverified

Show:10 25 50

← PrevPage 53 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified