SOTAVerified

Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Showing 28512900 of 4240 papers

TitleStatusHype
SC2 Benchmark: Supervised Compression for Split Computing0
Graph Flow: Cross-layer Graph Flow Distillation for Dual Efficient Medical Image SegmentationCode1
Unified Visual Transformer CompressionCode1
SATS: Self-Attention Transfer for Continual Semantic SegmentationCode1
On the benefits of knowledge distillation for adversarial robustness0
DS3-Net: Difficulty-perceived Common-to-T1ce Semi-Supervised Multimodal MRI Synthesis Network0
CEKD:Cross Ensemble Knowledge Distillation for Augmented Fine-grained Data0
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio ClassificationCode3
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation0
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation0
Medical Image Segmentation on MRI Images with Missing Modalities: A Review0
Deep Class Incremental Learning from Decentralized DataCode0
Improving Neural ODEs via Knowledge Distillation0
Look Backward and Forward: Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation0
Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGACode0
Prediction-Guided Distillation for Dense Object DetectionCode1
Membership Privacy Protection for Image Translation Models via Adversarial Knowledge Distillation0
Representation Compensation Networks for Continual Semantic SegmentationCode1
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better TransferabilityCode1
Efficient Sub-structured Knowledge DistillationCode0
How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting0
PyNET-QxQ: An Efficient PyNET Variant for QxQ Bayer Pattern Demosaicing in CMOS Image SensorsCode0
On Generalizing Beyond Domains in Cross-Domain Continual Learning0
Multi-trial Neural Architecture Search with Lottery Tickets0
Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine TranslationCode1
Student Becomes Decathlon Master in Retinal Vessel Segmentation via Dual-teacher Multi-target Domain AdaptationCode0
Enhance Language Identification using Dual-mode Model with Knowledge DistillationCode0
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter PruningCode1
Consistent Representation Learning for Continual Relation ExtractionCode1
Better Supervisory Signals by Observing Learning PathsCode0
MIAShield: Defending Membership Inference Attacks via Preemptive Exclusion of Members0
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
TRILLsson: Distilled Universal Paralinguistic Speech Representations0
Dual Embodied-Symbolic Concept Representations for Deep Learning0
Self-Supervised Vision Transformers Learn Visual Concepts in HistopathologyCode1
Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation0
TransKD: Transformer Knowledge Distillation for Efficient Semantic SegmentationCode1
Content-Variant Reference Image Quality Assessment via Knowledge DistillationCode1
Joint Answering and Explanation for Visual Commonsense ReasoningCode0
Bridging the Gap Between Patient-specific and Patient-independent Seizure Prediction via Knowledge Distillation0
Learn From the Past: Experience Ensemble Knowledge Distillation0
Efficient Video Segmentation Models with Per-frame Inference0
Are All Linear Regions Created Equal?Code0
Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification0
Distilled Neural Networks for Efficient Learning to RankCode0
A Novel Architecture Slimming Method for Network Pruning and Knowledge Distillation0
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning0
CaMEL: Mean Teacher Learning for Image CaptioningCode1
Cross-Task Knowledge Distillation in Multi-Task Recommendation0
General Cyclical Training of Neural NetworksCode1
Show:102550
← PrevPage 58 of 85Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ScaleKD (T:BEiT-L S:ViT-B/14)Top-1 accuracy %86.43Unverified
2ScaleKD (T:Swin-L S:ViT-B/16)Top-1 accuracy %85.53Unverified
3ScaleKD (T:Swin-L S:ViT-S/16)Top-1 accuracy %83.93Unverified
4ScaleKD (T:Swin-L S:Swin-T)Top-1 accuracy %83.8Unverified
5KD++(T: regnety-16GF S:ViT-B)Top-1 accuracy %83.6Unverified
6VkD (T:RegNety 160 S:DeiT-S)Top-1 accuracy %82.9Unverified
7SpectralKD (T:Swin-S S:Swin-T)Top-1 accuracy %82.7Unverified
8ScaleKD (T:Swin-L S:ResNet-50)Top-1 accuracy %82.55Unverified
9DiffKD (T:Swin-L S: Swin-T)Top-1 accuracy %82.5Unverified
10DIST (T: Swin-L S: Swin-T)Top-1 accuracy %82.3Unverified
#ModelMetricClaimedVerifiedStatus
1SRD (T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)79.86Unverified
2shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)78.76Unverified
3MV-MR (T: CLIP/ViT-B-16 S: resnet50)Top-1 Accuracy (%)78.6Unverified
4resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)78.28Unverified
5resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])Top-1 Accuracy (%)78.08Unverified
6ReviewKD++(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)77.93Unverified
7ReviewKD++(T:resnet-32x4, S:shufflenet-v1)Top-1 Accuracy (%)77.68Unverified
8resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)77.5Unverified
9resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.68Unverified
10resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.31Unverified
#ModelMetricClaimedVerifiedStatus
1LSHFM (T: ResNet101 S: ResNet50)mAP93.17Unverified
2LSHFM (T: ResNet101 S: MobileNetV2)mAP90.14Unverified
#ModelMetricClaimedVerifiedStatus
1TIE-KD (T: Adabins S: MobileNetV2)RMSE2.43Unverified