Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 851–900 of 4240 papers

Title	Date	Tasks	Status	Hype
Distill on the Go: Online knowledge distillation in self-supervised learning	Apr 20, 2021	Knowledge DistillationSelf-Supervised Learning	CodeCode Available	1
Real-time Event Recognition of Long-distance Distributed Vibration Sensing with Knowledge Distillation and Hardware Acceleration	Aug 7, 2024	GPUIntrusion Detection	CodeCode Available	1
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation	Oct 11, 2023	Decoderfr-en	CodeCode Available	1
DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners	Jul 4, 2024	Audio ClassificationAudio Tagging	CodeCode Available	1
Data Diversification: A Simple Strategy For Neural Machine Translation	Nov 5, 2019	Knowledge DistillationMachine Translation	CodeCode Available	1
Better Estimation of the KL Divergence Between Language Models	Apr 14, 2025	Knowledge Distillation	CodeCode Available	1
Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation	Mar 15, 2021	Data AugmentationKnowledge Distillation	CodeCode Available	1
Regularizing Class-wise Predictions via Self-knowledge Distillation	Mar 31, 2020	image-classificationImage Classification	CodeCode Available	1
Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation	Dec 17, 2021	Contrastive LearningKnowledge Distillation	CodeCode Available	1
Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems	Oct 1, 2019	Edge-computingImage Classification	CodeCode Available	1
Distilled Semantics for Comprehensive Scene Understanding from Videos	Mar 31, 2020	Depth EstimationKnowledge Distillation	CodeCode Available	1
Going Beyond Classification Accuracy Metrics in Model Compression	Dec 3, 2020	ClassificationEdge-computing	CodeCode Available	1
Adjoined Networks: A Training Paradigm with Applications to Network Compression	Jun 10, 2020	Knowledge DistillationNeural Architecture Search	CodeCode Available	1
Data-Free Class-Incremental Hand Gesture Recognition	Jan 1, 2023	class-incremental learningClass Incremental Learning	CodeCode Available	1
BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection	Nov 17, 2022	3D Object DetectionDepth Estimation	CodeCode Available	1
Rethinking Centered Kernel Alignment in Knowledge Distillation	Jan 22, 2024	image-classificationImage Classification	CodeCode Available	1
BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection	Dec 1, 2022	3D Object DetectionAutonomous Driving	CodeCode Available	1
Distilling a Powerful Student Model via Online Knowledge Distillation	Mar 26, 2021	Knowledge Distillation	CodeCode Available	1
SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection	Mar 29, 2023	3D geometry3D Object Detection	CodeCode Available	1
Rethinking Momentum Knowledge Distillation in Online Continual Learning	Sep 6, 2023	Continual LearningKnowledge Distillation	CodeCode Available	1
Data-Free Knowledge Distillation for Heterogeneous Federated Learning	May 20, 2021	Data-free Knowledge DistillationFederated Learning	CodeCode Available	1
Distilling Visual Priors from Self-Supervised Learning	Aug 1, 2020	ClassificationContrastive Learning	CodeCode Available	1
Return of the Encoder: Maximizing Parameter Efficiency for SLMs	Jan 27, 2025	Computational EfficiencyCPU	CodeCode Available	1
Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data	Jul 20, 2023	Image RegistrationKeypoint Detection	CodeCode Available	1
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model	Aug 2, 2023	HallucinationImage Captioning	CodeCode Available	1
Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint	Jan 1, 2023	Data AugmentationData-free Knowledge Distillation	CodeCode Available	1
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression	Sep 7, 2021	Knowledge DistillationQuantization	CodeCode Available	1
Revisiting Prototypical Network for Cross Domain Few-Shot Learning	Jan 1, 2023	Cross-Domain Few-Shotcross-domain few-shot learning	CodeCode Available	1
A Neural Span-Based Continual Named Entity Recognition Model	Feb 23, 2023	Continual LearningContinual Named Entity Recognition	CodeCode Available	1
Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones	Mar 10, 2021	Knowledge Distillationobject-detection	CodeCode Available	1
AlphaFold Distillation for Protein Design	Oct 5, 2022	DiversityDrug Discovery	CodeCode Available	1
Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial Training	Jun 25, 2023	Adversarial RobustnessKnowledge Distillation	CodeCode Available	1
Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation	Oct 10, 2022	Knowledge DistillationMachine Translation	CodeCode Available	1
DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object Detection	Feb 16, 2025	Domain AdaptationKnowledge Distillation	CodeCode Available	1
ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques	Mar 21, 2021	Knowledge Distillation	CodeCode Available	1
Distilling Script Knowledge from Large Language Models for Constrained Language Planning	May 9, 2023	Knowledge Distillation	CodeCode Available	1
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models	Nov 2, 2023	Data AugmentationDomain Generalization	CodeCode Available	1
Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection	Jan 1, 2023	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Distilling the Knowledge in a Neural Network	Mar 9, 2015	Knowledge DistillationMixture-of-Experts	CodeCode Available	1
Distilling Object Detectors via Decoupled Features	Mar 26, 2021	image-classificationImage Classification	CodeCode Available	1
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation	Feb 5, 2024	Knowledge DistillationRetrieval	CodeCode Available	1
Scaling Sparse and Dense Retrieval in Decoder-Only LLMs	Feb 21, 2025	DecoderKnowledge Distillation	CodeCode Available	1
AltDiffusion: A Multilingual Text-to-Image Diffusion Model	Aug 19, 2023	BlockingConcept Alignment	CodeCode Available	1
SCPNet: Semantic Scene Completion on Point Cloud	Mar 13, 2023	3D Semantic Scene CompletionKnowledge Distillation	CodeCode Available	1
Distilling Object Detectors with Feature Richness	Nov 1, 2021	Knowledge DistillationModel Compression	CodeCode Available	1
Always Clear Depth: Robust Monocular Depth Estimation under Adverse Weather	May 18, 2025	Autonomous DrivingDepth Estimation	CodeCode Available	1
Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification	Oct 7, 2022	Classificationimage-classification	CodeCode Available	1
Distilling Image Classifiers in Object Detectors	Jun 9, 2021	Knowledge DistillationObject	CodeCode Available	1
Self-Supervised Adaptation for Video Super-Resolution	Mar 18, 2021	Image Super-ResolutionKnowledge Distillation	CodeCode Available	1
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR	Aug 9, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1

Show:10 25 50

← PrevPage 18 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified