SOTAVerified

Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Showing 25262550 of 4240 papers

TitleStatusHype
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA0
Remembering Transformer for Continual Learning0
Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation0
Remote Sensing Image Classification with Decoupled Knowledge Distillation0
Removing Rain Streaks via Task Transfer Learning0
Representation Consolidation from Multiple Expert Teachers0
Representation Disparity-aware Distillation for 3D Object Detection0
Representation Transfer by Optimal Transport0
Research on Multilingual News Clustering Based on Cross-Language Word Embeddings0
Research on the Online Update Method for Retrieval-Augmented Generation (RAG) Model with Incremental Learning0
Residual Knowledge Distillation0
ResKD: Residual-Guided Knowledge Distillation0
Resolution-Based Distillation for Efficient Histology Image Classification0
Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework0
REFT: Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained Environments0
Respecting Transfer Gap in Knowledge Distillation0
Response-based Distillation for Incremental Object Detection0
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers0
Rethinking Attention Mechanism in Time Series Classification0
Rethinking Feature-Based Knowledge Distillation for Face Recognition0
Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off0
Rethinking Knowledge Distillation via Cross-Entropy0
Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective0
Rethinking Position Bias Modeling with Knowledge Distillation for CTR Prediction0
Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective0
Show:102550
← PrevPage 102 of 170Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ScaleKD (T:BEiT-L S:ViT-B/14)Top-1 accuracy %86.43Unverified
2ScaleKD (T:Swin-L S:ViT-B/16)Top-1 accuracy %85.53Unverified
3ScaleKD (T:Swin-L S:ViT-S/16)Top-1 accuracy %83.93Unverified
4ScaleKD (T:Swin-L S:Swin-T)Top-1 accuracy %83.8Unverified
5KD++(T: regnety-16GF S:ViT-B)Top-1 accuracy %83.6Unverified
6VkD (T:RegNety 160 S:DeiT-S)Top-1 accuracy %82.9Unverified
7SpectralKD (T:Swin-S S:Swin-T)Top-1 accuracy %82.7Unverified
8ScaleKD (T:Swin-L S:ResNet-50)Top-1 accuracy %82.55Unverified
9DiffKD (T:Swin-L S: Swin-T)Top-1 accuracy %82.5Unverified
10DIST (T: Swin-L S: Swin-T)Top-1 accuracy %82.3Unverified
#ModelMetricClaimedVerifiedStatus
1SRD (T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)79.86Unverified
2shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)78.76Unverified
3MV-MR (T: CLIP/ViT-B-16 S: resnet50)Top-1 Accuracy (%)78.6Unverified
4resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)78.28Unverified
5resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])Top-1 Accuracy (%)78.08Unverified
6ReviewKD++(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)77.93Unverified
7ReviewKD++(T:resnet-32x4, S:shufflenet-v1)Top-1 Accuracy (%)77.68Unverified
8resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)77.5Unverified
9resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.68Unverified
10resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.31Unverified
#ModelMetricClaimedVerifiedStatus
1LSHFM (T: ResNet101 S: ResNet50)mAP93.17Unverified
2LSHFM (T: ResNet101 S: MobileNetV2)mAP90.14Unverified
#ModelMetricClaimedVerifiedStatus
1TIE-KD (T: Adabins S: MobileNetV2)RMSE2.43Unverified