SOTAVerified

Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Showing 351400 of 4240 papers

TitleStatusHype
NormKD: Normalized Logits for Knowledge DistillationCode1
BearingPGA-Net: A Lightweight and Deployable Bearing Fault Diagnosis Network via Decoupled Knowledge Distillation and FPGA AccelerationCode1
f-Divergence Minimization for Sequence-Level Knowledge DistillationCode1
Fitting Auditory Filterbanks with Multiresolution Neural NetworksCode1
MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech EnhancementCode1
CLIP-KD: An Empirical Study of CLIP Model DistillationCode1
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal TransportCode1
Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited DataCode1
FedDefender: Client-Side Attack-Tolerant Federated LearningCode1
Class-relation Knowledge Distillation for Novel Class DiscoveryCode1
Cumulative Spatial Knowledge Distillation for Vision TransformersCode1
DARTS: Double Attention Reference-based Transformer for Super-resolutionCode1
Multimodal Distillation for Egocentric Action RecognitionCode1
Learning to Retrieve In-Context Examples for Large Language ModelsCode1
mCLIP: Multilingual CLIP via Cross-lingual TransferCode1
CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic SegmentationCode1
Distilling Large Vision-Language Model with Out-of-Distribution GeneralizabilityCode1
MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation DatasetsCode1
FedDefender: Backdoor Attack Defense in Federated LearningCode1
Quantization Variation: A New Perspective on Training Transformers with Low-Bit PrecisionCode1
Audio Embeddings as Teachers for Music ClassificationCode1
NaturalInversion: Data-Free Image Synthesis Improving Real-World ConsistencyCode1
Mitigating Accuracy-Robustness Trade-off via Balanced Multi-Teacher Adversarial DistillationCode1
On information captured by neural networks: connections with memorization and generalizationCode1
Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial TrainingCode1
CrossKD: Cross-Head Knowledge Distillation for Object DetectionCode1
Coaching a Teachable StudentCode1
BPKD: Boundary Privileged Knowledge Distillation For Semantic SegmentationCode1
Adaptive Multi-Teacher Knowledge Distillation with Meta-LearningCode1
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation MethodCode1
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language ModelCode1
RankFormer: Listwise Learning-to-Rank Using Listwide LabelsCode1
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level TasksCode1
Orca: Progressive Learning from Complex Explanation Traces of GPT-4Code1
I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage RetrievalCode1
Revisiting Data-Free Knowledge Distillation with Poisoned TeachersCode1
PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) PlanningCode1
Semi-supervised Pathological Image Segmentation via Cross Distillation of Multiple AttentionsCode1
Learning to Learn from APIs: Black-Box Data-Free Meta-LearningCode1
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech ModelsCode1
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker VerificationCode1
FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image RecognitionCode1
Towards Better Entity Linking with Multi-View Enhanced DistillationCode1
Improving Knowledge Distillation via Regularizing Feature Norm and DirectionCode1
OVO: Open-Vocabulary OccupancyCode1
Towards Higher Pareto Frontier in Multilingual Machine TranslationCode1
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large ScaleCode1
Knowledge Diffusion for DistillationCode1
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation ObjectivesCode1
NORM: Knowledge Distillation via N-to-One Representation MatchingCode1
Show:102550
← PrevPage 8 of 85Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ScaleKD (T:BEiT-L S:ViT-B/14)Top-1 accuracy %86.43Unverified
2ScaleKD (T:Swin-L S:ViT-B/16)Top-1 accuracy %85.53Unverified
3ScaleKD (T:Swin-L S:ViT-S/16)Top-1 accuracy %83.93Unverified
4ScaleKD (T:Swin-L S:Swin-T)Top-1 accuracy %83.8Unverified
5KD++(T: regnety-16GF S:ViT-B)Top-1 accuracy %83.6Unverified
6VkD (T:RegNety 160 S:DeiT-S)Top-1 accuracy %82.9Unverified
7SpectralKD (T:Swin-S S:Swin-T)Top-1 accuracy %82.7Unverified
8ScaleKD (T:Swin-L S:ResNet-50)Top-1 accuracy %82.55Unverified
9DiffKD (T:Swin-L S: Swin-T)Top-1 accuracy %82.5Unverified
10DIST (T: Swin-L S: Swin-T)Top-1 accuracy %82.3Unverified
#ModelMetricClaimedVerifiedStatus
1SRD (T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)79.86Unverified
2shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)78.76Unverified
3MV-MR (T: CLIP/ViT-B-16 S: resnet50)Top-1 Accuracy (%)78.6Unverified
4resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)78.28Unverified
5resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])Top-1 Accuracy (%)78.08Unverified
6ReviewKD++(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)77.93Unverified
7ReviewKD++(T:resnet-32x4, S:shufflenet-v1)Top-1 Accuracy (%)77.68Unverified
8resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)77.5Unverified
9resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.68Unverified
10resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.31Unverified
#ModelMetricClaimedVerifiedStatus
1LSHFM (T: ResNet101 S: ResNet50)mAP93.17Unverified
2LSHFM (T: ResNet101 S: MobileNetV2)mAP90.14Unverified
#ModelMetricClaimedVerifiedStatus
1TIE-KD (T: Adabins S: MobileNetV2)RMSE2.43Unverified