Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1451–1500 of 4240 papers

Title	Date	Tasks	Status
Enhancing Scalability in Recommender Systems through Lottery Ticket Hypothesis and Knowledge Distillation-based Neural Network Pruning	Jan 19, 2024	GPUKnowledge Distillation	—Unverified
Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition	Aug 1, 2020	DiversityFace Recognition	—Unverified
Enhancing Romanian Offensive Language Detection through Knowledge Distillation, Multi-Task Learning, and Data Augmentation	Sep 30, 2024	Data AugmentationKnowledge Distillation	—Unverified
Enhancing Review Comprehension with Domain-Specific Commonsense	Apr 6, 2020	Aspect ExtractionKnowledge Distillation	—Unverified
Enhancing Once-For-All: A Study on Parallel Blocks, Skip Connections and Early Exits	Feb 3, 2023	AllKnowledge Distillation	—Unverified
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders	Dec 19, 2023	Knowledge Distillation	—Unverified
Experimentation in Content Moderation using RWKV	Sep 5, 2024	CPUKnowledge Distillation	—Unverified
Experimenting with Knowledge Distillation techniques for performing Brain Tumor Segmentation	May 24, 2021	Brain Tumor SegmentationKnowledge Distillation	—Unverified
Explainability-Driven Leaf Disease Classification Using Adversarial Training and Knowledge Distillation	Dec 30, 2023	Adversarial AttackClassification	—Unverified
Explainable Knowledge Distillation for On-device Chest X-Ray Classification	May 10, 2023	Explainable artificial intelligenceExplainable Artificial Intelligence (XAI)	—Unverified
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning	Nov 20, 2024	Knowledge DistillationLarge Language Model	—Unverified
Explaining Knowledge Distillation by Quantifying the Knowledge	Mar 7, 2020	Knowledge Distillation	—Unverified
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval	May 28, 2023	Image RetrievalKnowledge Distillation	—Unverified
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation	Dec 6, 2019	Data AugmentationKnowledge Distillation	—Unverified
A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks	May 29, 2022	Data Augmentationimage-classification	—Unverified
Explicit Connection Distillation	Jan 1, 2021	image-classificationImage Classification	—Unverified
A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition	Feb 24, 2025	image-classificationImage Classification	—Unverified
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified
FlyKD: Graph Knowledge Distillation on the Fly with Curriculum Learning	Mar 16, 2024	Knowledge Distillation	—Unverified
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR	Mar 24, 2023	Image RetrievalKnowledge Distillation	—Unverified
Enhancing Modality-Agnostic Representations via Meta-Learning for Brain Tumor Segmentation	Feb 8, 2023	Brain Tumor SegmentationImage Generation	—Unverified
Enhancing Mapless Trajectory Prediction through Knowledge Distillation	Jun 25, 2023	Autonomous DrivingKnowledge Distillation	—Unverified
Exploring compressibility of transformer based text-to-music (TTM) models	Jun 24, 2024	DecoderFAD	—Unverified
Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch	May 21, 2024	Knowledge Distillation	—Unverified
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices	Nov 30, 2023	Knowledge Distillation	—Unverified
Compression of Deep Learning Models for Text: A Survey	Aug 12, 2020	Deep LearningInformation Retrieval	—Unverified
Generalized Supervised Contrastive Learning	Jun 1, 2022	Contrastive LearningKnowledge Distillation	—Unverified
Exploring Extreme Quantization in Spiking Language Models	May 4, 2024	Knowledge DistillationLanguage Modeling	—Unverified
Compression of Acoustic Event Detection Models With Quantized Distillation	Jul 1, 2019	Event DetectionKnowledge Distillation	—Unverified
Continual Learning for Class- and Domain-Incremental Semantic Segmentation	Sep 16, 2022	class-incremental learningClass Incremental Learning	—Unverified
FLAR: A Unified Prototype Framework for Few-Sample Lifelong Active Recognition	Jan 1, 2021	Knowledge DistillationLifelong learning	—Unverified
For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction	May 10, 2024	Gender PredictionKnowledge Distillation	—Unverified
Compressing Visual-linguistic Model via Knowledge Distillation	Apr 5, 2021	Image CaptioningKnowledge Distillation	—Unverified
Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation	Dec 31, 2020	Knowledge DistillationMachine Translation	—Unverified
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models	Jan 16, 2025	Knowledge DistillationMemorization	—Unverified
A Theoretical Analysis of Soft-Label vs Hard-Label Training in Neural Networks	Dec 12, 2024	Binary ClassificationKnowledge Distillation	—Unverified
Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models	Jun 21, 2025	Dimensionality ReductionKeyword Spotting	—Unverified
Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation	Feb 23, 2021	Knowledge Distillation	—Unverified
Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection	Jan 11, 2024	Human-Object Interaction DetectionKnowledge Distillation	—Unverified
Unsupervised Continual Learning Via Pseudo Labels	Apr 14, 2021	ClusteringContinual Learning	—Unverified
Continual Learning with Diffusion-based Generative Replay for Industrial Streaming Data	Jun 22, 2024	Continual LearningKnowledge Distillation	—Unverified
A Note on Knowledge Distillation Loss Function for Object Classification	Sep 14, 2021	Knowledge DistillationModel Compression	—Unverified
Continual Learning with Dirichlet Generative-based Rehearsal	Sep 13, 2023	Continual LearningIncremental Learning	—Unverified
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT	Jul 1, 2020	Document ClassificationGeneral Classification	—Unverified
Extending Label Smoothing Regularization with Self-Knowledge Distillation	Sep 11, 2020	Knowledge DistillationSelf-Knowledge Distillation	—Unverified
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation	Jan 22, 2025	Knowledge Distillation	—Unverified
Extracting knowledge from features with multilevel abstraction	Dec 4, 2021	Data AugmentationKnowledge Distillation	—Unverified
Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment	Sep 2, 2024	CPUGPU	—Unverified
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation	Apr 24, 2021	Knowledge Distillation	—Unverified
Enhancing CTC-Based Visual Speech Recognition	Sep 11, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified

Show:10 25 50

← PrevPage 30 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified