Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 4240 papers

Title	Date	Tasks	Status	Hype
NormKD: Normalized Logits for Knowledge Distillation	Aug 1, 2023	image-classificationImage Classification	CodeCode Available	1
BearingPGA-Net: A Lightweight and Deployable Bearing Fault Diagnosis Network via Decoupled Knowledge Distillation and FPGA Acceleration	Jul 31, 2023	CPUFault Diagnosis	CodeCode Available	1
f-Divergence Minimization for Sequence-Level Knowledge Distillation	Jul 27, 2023	Knowledge Distillation	CodeCode Available	1
Fitting Auditory Filterbanks with Multiresolution Neural Networks	Jul 25, 2023	Inductive BiasKnowledge Distillation	CodeCode Available	1
MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement	Jul 24, 2023	Knowledge DistillationSpeech Enhancement	CodeCode Available	1
CLIP-KD: An Empirical Study of CLIP Model Distillation	Jul 24, 2023	Contrastive LearningCross-Modal Retrieval	CodeCode Available	1
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport	Jul 21, 2023	DenoisingKnowledge Distillation	CodeCode Available	1
Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data	Jul 20, 2023	Image RegistrationKeypoint Detection	CodeCode Available	1
FedDefender: Client-Side Attack-Tolerant Federated Learning	Jul 18, 2023	Federated LearningKnowledge Distillation	CodeCode Available	1
Class-relation Knowledge Distillation for Novel Class Discovery	Jul 18, 2023	Knowledge DistillationNovel Class Discovery	CodeCode Available	1
Cumulative Spatial Knowledge Distillation for Vision Transformers	Jul 17, 2023	Inductive BiasKnowledge Distillation	CodeCode Available	1
DARTS: Double Attention Reference-based Transformer for Super-resolution	Jul 17, 2023	Image Super-ResolutionKnowledge Distillation	CodeCode Available	1
Multimodal Distillation for Egocentric Action Recognition	Jul 14, 2023	Action RecognitionKnowledge Distillation	CodeCode Available	1
Learning to Retrieve In-Context Examples for Large Language Models	Jul 14, 2023	In-Context LearningKnowledge Distillation	CodeCode Available	1
mCLIP: Multilingual CLIP via Cross-lingual Transfer	Jul 10, 2023	Contrastive LearningCross-Lingual Transfer	CodeCode Available	1
CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation	Jul 9, 2023	Autonomous VehiclesKnowledge Distillation	CodeCode Available	1
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability	Jul 6, 2023	Few-Shot Image ClassificationImage Classification	CodeCode Available	1
MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets	Jul 5, 2023	Efficient ViTsImage Segmentation	CodeCode Available	1
FedDefender: Backdoor Attack Defense in Federated Learning	Jul 2, 2023	Backdoor AttackData Poisoning	CodeCode Available	1
Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision	Jul 1, 2023	Knowledge DistillationModel Compression	CodeCode Available	1
Audio Embeddings as Teachers for Music Classification	Jun 30, 2023	ClassificationInformation Retrieval	CodeCode Available	1
NaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency	Jun 29, 2023	Image GenerationKnowledge Distillation	CodeCode Available	1
Mitigating Accuracy-Robustness Trade-off via Balanced Multi-Teacher Adversarial Distillation	Jun 28, 2023	Adversarial RobustnessKnowledge Distillation	CodeCode Available	1
On information captured by neural networks: connections with memorization and generalization	Jun 28, 2023	InformativenessKnowledge Distillation	CodeCode Available	1
Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial Training	Jun 25, 2023	Adversarial RobustnessKnowledge Distillation	CodeCode Available	1
CrossKD: Cross-Head Knowledge Distillation for Object Detection	Jun 20, 2023	Dense Object DetectionKnowledge Distillation	CodeCode Available	1
Coaching a Teachable Student	Jun 16, 2023	CARLA longest6Knowledge Distillation	CodeCode Available	1
BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation	Jun 13, 2023	Knowledge DistillationSegmentation	CodeCode Available	1
Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning	Jun 11, 2023	Knowledge DistillationMeta-Learning	CodeCode Available	1
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method	Jun 11, 2023	Knowledge DistillationLanguage Modeling	CodeCode Available	1
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model	Jun 11, 2023	General KnowledgeKnowledge Distillation	CodeCode Available	1
RankFormer: Listwise Learning-to-Rank Using Listwide Labels	Jun 9, 2023	Knowledge DistillationLearning-To-Rank	CodeCode Available	1
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks	Jun 7, 2023	Audio ClassificationAudio Tagging	CodeCode Available	1
Orca: Progressive Learning from Complex Explanation Traces of GPT-4	Jun 5, 2023	Imitation LearningKnowledge Distillation	CodeCode Available	1
I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval	Jun 4, 2023	Knowledge DistillationPassage Retrieval	CodeCode Available	1
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers	Jun 4, 2023	Backdoor Defense for Data-Free Distillation with Poisoned TeachersData-free Knowledge Distillation	CodeCode Available	1
PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning	May 31, 2023	Common Sense Reasoningcounterfactual	CodeCode Available	1
Semi-supervised Pathological Image Segmentation via Cross Distillation of Multiple Attentions	May 30, 2023	DecoderImage Segmentation	CodeCode Available	1
Learning to Learn from APIs: Black-Box Data-Free Meta-Learning	May 28, 2023	Few-Shot LearningKnowledge Distillation	CodeCode Available	1
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models	May 28, 2023	Knowledge DistillationSelf-Supervised Learning	CodeCode Available	1
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification	May 27, 2023	Knowledge DistillationSelf-Supervised Learning	CodeCode Available	1
FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image Recognition	May 27, 2023	image-classificationImage Classification	CodeCode Available	1
Towards Better Entity Linking with Multi-View Enhanced Distillation	May 27, 2023	Entity LinkingKnowledge Distillation	CodeCode Available	1
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	May 26, 2023	Domain AdaptationKnowledge Distillation	CodeCode Available	1
OVO: Open-Vocabulary Occupancy	May 25, 2023	Knowledge DistillationPrediction	CodeCode Available	1
Towards Higher Pareto Frontier in Multilingual Machine Translation	May 25, 2023	Knowledge DistillationMachine Translation	CodeCode Available	1
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale	May 25, 2023	Data AugmentationKnowledge Distillation	CodeCode Available	1
Knowledge Diffusion for Distillation	May 25, 2023	Denoisingimage-classification	CodeCode Available	1
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives	May 24, 2023	Knowledge DistillationQNLI	CodeCode Available	1
NORM: Knowledge Distillation via N-to-One Representation Matching	May 23, 2023	Knowledge Distillation	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 85Next →

All datasets ImageNet CIFAR-100 COCO (Common Objects in Context)COCO 2017 val PASCAL VOC KITTI

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ScaleKD (T:BEiT-L S:ViT-B/14)	Top-1 accuracy %	86.43	—	Unverified
2	ScaleKD (T:Swin-L S:ViT-B/16)	Top-1 accuracy %	85.53	—	Unverified
3	ScaleKD (T:Swin-L S:ViT-S/16)	Top-1 accuracy %	83.93	—	Unverified
4	ScaleKD (T:Swin-L S:Swin-T)	Top-1 accuracy %	83.8	—	Unverified
5	KD++(T: regnety-16GF S:ViT-B)	Top-1 accuracy %	83.6	—	Unverified
6	VkD (T:RegNety 160 S:DeiT-S)	Top-1 accuracy %	82.9	—	Unverified
7	SpectralKD (T:Swin-S S:Swin-T)	Top-1 accuracy %	82.7	—	Unverified
8	ScaleKD (T:Swin-L S:ResNet-50)	Top-1 accuracy %	82.55	—	Unverified
9	DiffKD (T:Swin-L S: Swin-T)	Top-1 accuracy %	82.5	—	Unverified
10	DIST (T: Swin-L S: Swin-T)	Top-1 accuracy %	82.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SRD (T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	79.86	—	Unverified
2	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	78.76	—	Unverified
3	MV-MR (T: CLIP/ViT-B-16 S: resnet50)	Top-1 Accuracy (%)	78.6	—	Unverified
4	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	78.28	—	Unverified
5	resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])	Top-1 Accuracy (%)	78.08	—	Unverified
6	ReviewKD++(T:resnet-32x4, S:shufflenet-v2)	Top-1 Accuracy (%)	77.93	—	Unverified
7	ReviewKD++(T:resnet-32x4, S:shufflenet-v1)	Top-1 Accuracy (%)	77.68	—	Unverified
8	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	77.5	—	Unverified
9	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.68	—	Unverified
10	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	77.16	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	73.73	—	Unverified
3	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	box AP	47.6	—	Unverified
4	ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small)	mask AP	42.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	AP@0.5	61.8	—	Unverified
2	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18))	AP@0.5	57.96	—	Unverified
3	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2))	AP@0.5	55.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSHFM (T: ResNet101 S: ResNet50)	mAP	93.17	—	Unverified
2	LSHFM (T: ResNet101 S: MobileNetV2)	mAP	90.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TIE-KD (T: Adabins S: MobileNetV2)	RMSE	2.43	—	Unverified