Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–850 of 4240 papers

Title	Date	Tasks	Status	Hype
Distilling Linguistic Context for Language Model Compression	Sep 17, 2021	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty	Aug 3, 2023	Knowledge Distillation	CodeCode Available	1
Backdoor Attacks on Self-Supervised Learning	May 21, 2021	Backdoor AttackInductive Bias	CodeCode Available	1
On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation	Apr 23, 2022	Knowledge DistillationRecommendation Systems	CodeCode Available	1
Backdoor Cleansing with Unlabeled Data	Nov 22, 2022	Knowledge Distillation	CodeCode Available	1
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition	May 19, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Distilling Image Classifiers in Object Detectors	Jun 9, 2021	Knowledge DistillationObject	CodeCode Available	1
Cross-Layer Distillation with Semantic Calibration	Dec 6, 2020	Knowledge DistillationTransfer Learning	CodeCode Available	1
Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification	Nov 4, 2023	ClassificationCross-Domain Few-Shot	CodeCode Available	1
On information captured by neural networks: connections with memorization and generalization	Jun 28, 2023	InformativenessKnowledge Distillation	CodeCode Available	1
Online Knowledge Distillation for Efficient Pose Estimation	Aug 4, 2021	Knowledge DistillationPose Estimation	CodeCode Available	1
Online Knowledge Distillation via Collaborative Learning	Jun 1, 2020	Knowledge DistillationModel Compression	CodeCode Available	1
Online Prototype Learning for Online Continual Learning	Aug 1, 2023	Continual LearningKnowledge Distillation	CodeCode Available	1
Online Speculative Decoding	Oct 11, 2023	Knowledge Distillation	CodeCode Available	1
Distilling Knowledge from Reader to Retriever for Question Answering	Dec 8, 2020	Information RetrievalKnowledge Distillation	CodeCode Available	1
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer	Nov 14, 2022	Image ClassificationKnowledge Distillation	CodeCode Available	1
Balanced Knowledge Distillation for Long-tailed Learning	Apr 21, 2021	Knowledge Distillation	CodeCode Available	1
Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval	Oct 19, 2022	Cross-Modal RetrievalImage Retrieval	CodeCode Available	1
Cross-modality Data Augmentation for End-to-End Sign Language Translation	May 18, 2023	Data AugmentationKnowledge Distillation	CodeCode Available	1
DARTS: Double Attention Reference-based Transformer for Super-resolution	Jul 17, 2023	Image Super-ResolutionKnowledge Distillation	CodeCode Available	1
Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection	Nov 14, 2022	3D Object DetectionKnowledge Distillation	CodeCode Available	1
Orca: Progressive Learning from Complex Explanation Traces of GPT-4	Jun 5, 2023	Imitation LearningKnowledge Distillation	CodeCode Available	1
Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation	Mar 8, 2022	Continual LearningKnowledge Distillation	CodeCode Available	1
Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation	Apr 5, 2022	Class-Incremental Object DetectionIncremental Learning	CodeCode Available	1
OVO: Open-Vocabulary Occupancy	May 25, 2023	Knowledge DistillationPrediction	CodeCode Available	1
OVTrack: Open-Vocabulary Multiple Object Tracking	Apr 17, 2023	DenoisingHallucination	CodeCode Available	1
Aligned Structured Sparsity Learning for Efficient Image Super-Resolution	Dec 1, 2021	Image Super-ResolutionKnowledge Distillation	CodeCode Available	1
PARADE: Passage Representation Aggregation for Document Reranking	Aug 20, 2020	Ad-Hoc Information RetrievalDocument Ranking	CodeCode Available	1
PA-Seg: Learning from Point Annotations for 3D Medical Image Segmentation using Contextual Regularization and Cross Knowledge Distillation	Aug 11, 2022	Brain Tumor SegmentationImage Segmentation	CodeCode Available	1
CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion	Jun 28, 2024	Knowledge DistillationSuper-Resolution	CodeCode Available	1
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information	Jan 16, 2024	Knowledge Distillation	CodeCode Available	1
CTC-based Non-autoregressive Textless Speech-to-Speech Translation	Jun 11, 2024	Knowledge DistillationMachine Translation	CodeCode Available	1
DIOD: Self-Distillation Meets Object Discovery	Jan 1, 2024	Instance SegmentationKnowledge Distillation	CodeCode Available	1
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation	Nov 21, 2022	Click-Through Rate PredictionKnowledge Distillation	CodeCode Available	1
Distilling Holistic Knowledge with Graph Neural Networks	Aug 12, 2021	Knowledge Distillation	CodeCode Available	1
Curriculum Learning for Dense Retrieval Distillation	Apr 28, 2022	Knowledge DistillationPassage Retrieval	CodeCode Available	1
Curriculum Temperature for Knowledge Distillation	Nov 29, 2022	Image ClassificationKnowledge Distillation	CodeCode Available	1
BearingPGA-Net: A Lightweight and Deployable Bearing Fault Diagnosis Network via Decoupled Knowledge Distillation and FPGA Acceleration	Jul 31, 2023	CPUFault Diagnosis	CodeCode Available	1
Digging into contrastive learning for robust depth estimation with diffusion models	Apr 15, 2024	Contrastive LearningDenoising	CodeCode Available	1
Extending global-local view alignment for self-supervised learning with remote sensing imagery	Mar 12, 2023	Change DetectionContrastive Learning	CodeCode Available	1
PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning	May 31, 2023	Common Sense Reasoningcounterfactual	CodeCode Available	1
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation	Aug 24, 2021	Face RecognitionKnowledge Distillation	CodeCode Available	1
Point, Segment and Count: A Generalized Framework for Object Counting	Nov 21, 2023	Knowledge DistillationObject	CodeCode Available	1
Distilling Knowledge from Refinement in Multiple Instance Detection Networks	Apr 23, 2020	Knowledge DistillationMultiple Instance Learning	CodeCode Available	1
PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation	Mar 7, 2020	3D Human Pose EstimationKnowledge Distillation	CodeCode Available	1
Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning	Mar 30, 2023	Continual LearningKnowledge Distillation	CodeCode Available	1
Distilling Meta Knowledge on Heterogeneous Graph for Illicit Drug Trafficker Detection on Social Media	Dec 1, 2021	Knowledge DistillationMarketing	CodeCode Available	1
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval	Oct 7, 2022	Knowledge DistillationRetrieval	CodeCode Available	1
A New Knowledge Distillation Network for Incremental Few-Shot Surface Defect Detection	Sep 1, 2022	Defect DetectionKnowledge Distillation	CodeCode Available	1
Distilling Cross-Task Knowledge via Relationship Matching	Jun 1, 2020	Knowledge Distillation	CodeCode Available	1

Show:10 25 50

← PrevPage 17 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified