Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 4240 papers

Title	Date	Tasks	Status	Hype
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?	Nov 25, 2024	HallucinationKnowledge Distillation	CodeCode Available	7
Awesome Multi-modal Object Tracking	May 23, 2024	Autonomous DrivingKnowledge Distillation	CodeCode Available	5
A Survey on Knowledge Distillation of Large Language Models	Feb 20, 2024	Data AugmentationKnowledge Distillation	CodeCode Available	5
MobileSAMv2: Faster Segment Anything to Everything	Dec 15, 2023	DecoderKnowledge Distillation	CodeCode Available	5
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time	Nov 7, 2022	Knowledge DistillationMulti-Person Pose Estimation	CodeCode Available	5
SAMPart3D: Segment Any Part in 3D Objects	Nov 11, 2024	3D Generation3D Part Segmentation	CodeCode Available	4
LLM Inference Unveiled: Survey and Roofline Model Insights	Feb 26, 2024	Knowledge DistillationLanguage Modelling	CodeCode Available	4
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs	Feb 19, 2024	Knowledge Distillation	CodeCode Available	4
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation	Feb 16, 2024	Knowledge DistillationQuantization	CodeCode Available	4
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling	Nov 1, 2023	HallucinationKnowledge Distillation	CodeCode Available	4
Effective Whole-body Pose Estimation with Two-stages Distillation	Jul 29, 2023	2D Human Pose EstimationKnowledge Distillation	CodeCode Available	4
Vision-Language Models for Vision Tasks: A Survey	Apr 3, 2023	BenchmarkingKnowledge Distillation	CodeCode Available	4
Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models	Jun 23, 2025	Domain AdaptationGPU	CodeCode Available	3
Efficient Reasoning Models: A Survey	Apr 15, 2025	Knowledge DistillationModel Compression	CodeCode Available	3
A Survey on Inference Optimization Techniques for Mixture of Experts Models	Dec 18, 2024	Computational EfficiencyDistributed Computing	CodeCode Available	3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Aug 28, 2024	Computational EfficiencyHallucination	CodeCode Available	3
Compact Language Models via Pruning and Knowledge Distillation	Jul 19, 2024	Knowledge DistillationLanguage Modeling	CodeCode Available	3
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation	Jun 11, 2024	DecoderKnowledge Distillation	CodeCode Available	3
Recurrent Drafter for Fast Speculative Decoding in Large Language Models	Mar 14, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	3
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models	Mar 5, 2024	Knowledge DistillationPrompt Engineering	CodeCode Available	3
Logit Standardization in Knowledge Distillation	Mar 3, 2024	Knowledge Distillation	CodeCode Available	3
DistiLLM: Towards Streamlined Distillation for Large Language Models	Feb 6, 2024	Instruction FollowingKnowledge Distillation	CodeCode Available	3
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment	Dec 1, 2023	Contrastive LearningFew-Shot Learning	CodeCode Available	3
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech	Jul 13, 2022	DenoisingGPU	CodeCode Available	3
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification	Mar 13, 2022	Audio ClassificationKnowledge Distillation	CodeCode Available	3
N-LTP: An Open-source Neural Language Technology Platform for Chinese	Sep 24, 2020	Chinese Word SegmentationDependency Parsing	CodeCode Available	3
Semi-Supervised Speech Recognition via Local Prior Matching	Feb 24, 2020	Knowledge DistillationLanguage Modeling	CodeCode Available	3
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs	May 12, 2025	AI AgentKnowledge Distillation	CodeCode Available	2
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation	May 4, 2025	Knowledge DistillationMultivariate Time Series Forecasting	CodeCode Available	2
Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution	Apr 15, 2025	Image Super-ResolutionKnowledge Distillation	CodeCode Available	2
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking	Apr 12, 2025	Knowledge Distillation	CodeCode Available	2
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement	Apr 10, 2025	Knowledge DistillationVisual Reasoning	CodeCode Available	2
Scaling Down Text Encoders of Text-to-Image Diffusion Models	Mar 25, 2025	GPUImage Generation	CodeCode Available	2
A Comprehensive Survey on Knowledge Distillation	Mar 15, 2025	Knowledge DistillationSurvey	CodeCode Available	2
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization	Mar 11, 2025	GPUImage Generation	CodeCode Available	2
JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework	Feb 19, 2025	Change DetectionEarth Observation	CodeCode Available	2
Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark	Feb 8, 2025	Knowledge DistillationObject Tracking	CodeCode Available	2
LightGNN: Simple Graph Neural Network for Recommendation	Jan 6, 2025	Computational EfficiencyGraph Neural Network	CodeCode Available	2
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies	Jan 4, 2025	Edge-computingKnowledge Distillation	CodeCode Available	2
Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking	Dec 28, 2024	Knowledge DistillationVisual Tracking	CodeCode Available	2
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation	Dec 18, 2024	Image SegmentationKnowledge Distillation	CodeCode Available	2
BiM-VFI: directional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions	Dec 16, 2024	Knowledge DistillationMotion Estimation	CodeCode Available	2
Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation	Dec 11, 2024	image-classificationImage Classification	CodeCode Available	2
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models	Nov 21, 2024	image-classificationImage Classification	CodeCode Available	2
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	Nov 11, 2024	image-classificationImage Classification	CodeCode Available	2
MiniPLM: Knowledge Distillation for Pre-Training Language Models	Oct 22, 2024	DiversityKnowledge Distillation	CodeCode Available	2
Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution	Oct 5, 2024	Image Super-ResolutionKnowledge Distillation	CodeCode Available	2
Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review	Oct 4, 2024	Knowledge DistillationLogical Reasoning	CodeCode Available	2
Ruri: Japanese General Text Embeddings	Sep 12, 2024	Knowledge Distillation	CodeCode Available	2
Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation	Jul 26, 2024	Knowledge DistillationQuestion Answering	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified