SOTAVerified

Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Showing 701750 of 4240 papers

TitleStatusHype
KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and QuantizationCode1
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge DistillationCode1
A Discrepancy Aware Framework for Robust Anomaly DetectionCode1
Discriminator-Cooperated Feature Map Distillation for GAN CompressionCode1
AdaDistill: Adaptive Knowledge Distillation for Deep Face RecognitionCode1
Knowledge Condensation DistillationCode1
Distilling Knowledge from Graph Convolutional NetworksCode1
Confidence-Aware Multi-Teacher Knowledge DistillationCode1
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better TransferabilityCode1
Knowledge Distillation based Degradation Estimation for Blind Super-ResolutionCode1
Knowledge Distillation for Feature Extraction in Underwater VSLAMCode1
Knowledge Distillation for Multi-task LearningCode1
Conformer and Blind Noisy Students for Improved Image Quality AssessmentCode1
Knowledge Distillation from A Stronger TeacherCode1
Directed Acyclic Transformer for Non-Autoregressive Machine TranslationCode1
ConNER: Consistency Training for Cross-lingual Named Entity RecognitionCode1
Consensual Collaborative Training And Knowledge Distillation Based Facial Expression Recognition Under Noisy AnnotationsCode1
Consistent Representation Learning for Continual Relation ExtractionCode1
DisCo: Distilled Student Models Co-training for Semi-supervised Text MiningCode1
Camera clustering for scalable stream-based active distillationCode1
Designing Large Foundation Models for Efficient Training and Inference: A SurveyCode1
Content-Aware GAN CompressionCode1
AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric LearningCode1
Content-Variant Reference Image Quality Assessment via Knowledge DistillationCode1
Context-Aware Image Inpainting with Learned Semantic PriorsCode1
Audio Embeddings as Teachers for Music ClassificationCode1
CTC-based Non-autoregressive Textless Speech-to-Speech TranslationCode1
Knowledge Inheritance for Pre-trained Language ModelsCode1
Knowledge Transfer via Dense Cross-Layer Mutual-DistillationCode1
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object DetectionCode1
Continual All-in-One Adverse Weather Removal with Knowledge Replay on a Unified Network StructureCode1
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
Continual Collaborative Distillation for Recommender SystemCode1
Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient SpaceCode1
Anomaly Detection in Video via Self-Supervised and Multi-Task LearningCode1
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence EmbeddingsCode1
Continual evaluation for lifelong learning: Identifying the stability gapCode1
Learn from Foundation Model: Fruit Detection Model without Manual AnnotationCode1
AICSD: Adaptive Inter-Class Similarity Distillation for Semantic SegmentationCode1
Learning Compatible EmbeddingsCode1
Continual Learning for Image Segmentation with Dynamic QueryCode1
Continual Learning for LiDAR Semantic Segmentation: Class-Incremental and Coarse-to-Fine strategies on Sparse DataCode1
Distillation and Refinement of Reasoning in Small Language Models for Document Re-rankingCode1
Distilling Holistic Knowledge with Graph Neural NetworksCode1
Learning Generalizable Models for Vehicle Routing Problems via Knowledge DistillationCode1
Learning Light-Weight Translation Models from Deep TransformerCode1
Cumulative Spatial Knowledge Distillation for Vision TransformersCode1
DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic SegmentationCode1
Learning to Learn Parameterized Classification Networks for Scalable Input ImagesCode1
CaMEL: Mean Teacher Learning for Image CaptioningCode1
Show:102550
← PrevPage 15 of 85Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ScaleKD (T:BEiT-L S:ViT-B/14)Top-1 accuracy %86.43Unverified
2ScaleKD (T:Swin-L S:ViT-B/16)Top-1 accuracy %85.53Unverified
3ScaleKD (T:Swin-L S:ViT-S/16)Top-1 accuracy %83.93Unverified
4ScaleKD (T:Swin-L S:Swin-T)Top-1 accuracy %83.8Unverified
5KD++(T: regnety-16GF S:ViT-B)Top-1 accuracy %83.6Unverified
6VkD (T:RegNety 160 S:DeiT-S)Top-1 accuracy %82.9Unverified
7SpectralKD (T:Swin-S S:Swin-T)Top-1 accuracy %82.7Unverified
8ScaleKD (T:Swin-L S:ResNet-50)Top-1 accuracy %82.55Unverified
9DiffKD (T:Swin-L S: Swin-T)Top-1 accuracy %82.5Unverified
10DIST (T: Swin-L S: Swin-T)Top-1 accuracy %82.3Unverified
#ModelMetricClaimedVerifiedStatus
1SRD (T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)79.86Unverified
2shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)78.76Unverified
3MV-MR (T: CLIP/ViT-B-16 S: resnet50)Top-1 Accuracy (%)78.6Unverified
4resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)78.28Unverified
5resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])Top-1 Accuracy (%)78.08Unverified
6ReviewKD++(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)77.93Unverified
7ReviewKD++(T:resnet-32x4, S:shufflenet-v1)Top-1 Accuracy (%)77.68Unverified
8resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)77.5Unverified
9resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.68Unverified
10resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.31Unverified
#ModelMetricClaimedVerifiedStatus
1LSHFM (T: ResNet101 S: ResNet50)mAP93.17Unverified
2LSHFM (T: ResNet101 S: MobileNetV2)mAP90.14Unverified
#ModelMetricClaimedVerifiedStatus
1TIE-KD (T: Adabins S: MobileNetV2)RMSE2.43Unverified