Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3901–3950 of 4240 papers

Title	Date	Tasks	Status
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation	Aug 7, 2022	Graph GenerationKnowledge Distillation	—Unverified
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression	Apr 8, 2020	BlockingKnowledge Distillation	—Unverified
LaDiMo: Layer-wise Distillation Inspired MoEfier	Aug 8, 2024	Knowledge DistillationMixture-of-Experts	—Unverified
LAKD-Activation Mapping Distillation Based on Local Learning	Aug 21, 2024	Knowledge Distillation	—Unverified
LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage Approach	May 18, 2025	Deep Reinforcement LearningKnowledge Distillation	—Unverified
Language Graph Distillation for Low-Resource Machine Translation	Aug 17, 2019	Knowledge DistillationMachine Translation	—Unverified
Language Modelling via Learning to Rank	Oct 13, 2021	Knowledge DistillationLanguage Modelling	—Unverified
Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation	Sep 20, 2023	Image GenerationIn-Context Learning	—Unverified
LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models	Apr 17, 2024	Knowledge Distillation	—Unverified
Just CHOP: Embarrassingly Simple LLM Compression	May 24, 2023	Knowledge DistillationLanguage Modeling	—Unverified
Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection	Jan 26, 2024	Anomaly DetectionKnowledge Distillation	—Unverified
Large Language Model Meets Graph Neural Network in Knowledge Distillation	Feb 8, 2024	Contrastive LearningGraph Attention	—Unverified
Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity Recognition	Oct 13, 2024	Activity RecognitionFew-Shot Learning	—Unverified
Large-Scale Generative Data-Free Distillation	Dec 10, 2020	Knowledge DistillationModel Compression	—Unverified
LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks	Apr 17, 2023	Knowledge Distillation	—Unverified
Layer Attack Unlearning: Fast and Accurate Machine Unlearning via Layer Level Attack and Knowledge Distillation	Dec 28, 2023	Knowledge DistillationMachine Unlearning	—Unverified
LayerCollapse: Adaptive compression of neural networks	Nov 29, 2023	Computational Efficiencyimage-classification	—Unverified
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training	Jun 27, 2025	Knowledge DistillationMathematical Reasoning	—Unverified
Layerwise Bregman Representation Learning with Applications to Knowledge Distillation	Sep 15, 2022	Knowledge DistillationRepresentation Learning	—Unverified
Noisy Data Meets Privacy: Training Local Models with Post-Processed Remote Queries	May 25, 2024	Knowledge DistillationModel extraction	—Unverified
LEAD: Liberal Feature-based Distillation for Dense Retrieval	Dec 10, 2022	Document RankingKnowledge Distillation	—Unverified
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation	Feb 16, 2023	Knowledge DistillationSentence	—Unverified
Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality	Oct 2, 2023	Knowledge Distillation	—Unverified
Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios	Sep 12, 2024	Knowledge DistillationTransfer Learning	—Unverified
Learn From the Past: Experience Ensemble Knowledge Distillation	Feb 25, 2022	Knowledge DistillationTransfer Learning	—Unverified
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection	Aug 8, 2021	Action DetectionKnowledge Distillation	—Unverified
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection	Jun 1, 2024	Knowledge DistillationObject	—Unverified
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning	Feb 21, 2022	Continual LearningKnowledge Distillation	—Unverified
Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation	Jun 8, 2021	Knowledge DistillationOptical Flow Estimation	—Unverified
Learning Cross-Lingual IR from an English Retriever	Jan 16, 2022	Cross-Lingual Information RetrievalInformation Retrieval	—Unverified
Diverse Knowledge Distillation (DKD): A Solution for Improving The Robustness of Ensemble Models Against Adversarial Attacks	Jun 26, 2020	Ensemble Learningimage-classification	—Unverified
Learning Efficient Image Super-Resolution Networks via Structure-Regularized Pruning	Sep 29, 2021	Image Super-ResolutionKnowledge Distillation	—Unverified
Learning Efficient Object Detection Models with Knowledge Distillation	Dec 1, 2017	Knowledge DistillationModel Compression	—Unverified
Learning from a Lightweight Teacher for Efficient Knowledge Distillation	May 19, 2020	Knowledge Distillation	—Unverified
Learning From Biased Soft Labels	Feb 16, 2023	Knowledge Distillation	—Unverified
Learning from deep model via exploring local targets	Jan 1, 2021	Knowledge Distillationmodel	—Unverified
Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQL	Oct 15, 2024	Knowledge DistillationText to SQL	—Unverified
Learning from Matured Dumb Teacher for Fine Generalization	Aug 12, 2021	image-classificationImage Classification	—Unverified
Learning Human-Human Interactions in Images from Weak Textual Supervision	Apr 27, 2023	Human-Human Interaction RecognitionImage Captioning	—Unverified
MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing	Nov 19, 2020	AllKnowledge Distillation	—Unverified
Learning Interpretation with Explainable Knowledge Distillation	Nov 12, 2021	Knowledge DistillationModel Compression	—Unverified
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution	Jul 18, 2022	Image Super-ResolutionKnowledge Distillation	—Unverified
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation	Aug 17, 2023	Edge-computingInstance Segmentation	—Unverified
Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation	Sep 20, 2019	Knowledge DistillationPedestrian Detection	—Unverified
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities	Jul 16, 2024	Knowledge DistillationSemantic Segmentation	—Unverified
Learning Student-Friendly Teacher Networks for Knowledge Distillation	Feb 12, 2021	Knowledge DistillationTransfer Learning	—Unverified
Learning Student Networks via Feature Embedding	Dec 17, 2018	Knowledge Distillation	—Unverified
Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion	Jan 1, 2020	Knowledge Distillation	—Unverified
Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation	Mar 9, 2023	Knowledge Distillation	—Unverified
Learning Through Guidance: Knowledge Distillation for Endoscopic Image Classification	Aug 17, 2023	ClassificationFeature Engineering	—Unverified

Show:10 25 50

← PrevPage 79 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified