Knowledge Distillation
Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.
Papers
Showing 51–75 of 4240 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ScaleKD (T:BEiT-L S:ViT-B/14) | Top-1 accuracy % | 86.43 | — | Unverified |
| 2 | ScaleKD (T:Swin-L S:ViT-B/16) | Top-1 accuracy % | 85.53 | — | Unverified |
| 3 | ScaleKD (T:Swin-L S:ViT-S/16) | Top-1 accuracy % | 83.93 | — | Unverified |
| 4 | ScaleKD (T:Swin-L S:Swin-T) | Top-1 accuracy % | 83.8 | — | Unverified |
| 5 | KD++(T: regnety-16GF S:ViT-B) | Top-1 accuracy % | 83.6 | — | Unverified |
| 6 | VkD (T:RegNety 160 S:DeiT-S) | Top-1 accuracy % | 82.9 | — | Unverified |
| 7 | SpectralKD (T:Swin-S S:Swin-T) | Top-1 accuracy % | 82.7 | — | Unverified |
| 8 | ScaleKD (T:Swin-L S:ResNet-50) | Top-1 accuracy % | 82.55 | — | Unverified |
| 9 | DiffKD (T:Swin-L S: Swin-T) | Top-1 accuracy % | 82.5 | — | Unverified |
| 10 | DIST (T: Swin-L S: Swin-T) | Top-1 accuracy % | 82.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SRD (T:resnet-32x4, S:shufflenet-v2) | Top-1 Accuracy (%) | 79.86 | — | Unverified |
| 2 | shufflenet-v2(T:resnet-32x4, S:shufflenet-v2) | Top-1 Accuracy (%) | 78.76 | — | Unverified |
| 3 | MV-MR (T: CLIP/ViT-B-16 S: resnet50) | Top-1 Accuracy (%) | 78.6 | — | Unverified |
| 4 | resnet8x4 (T: resnet32x4 S: resnet8x4) | Top-1 Accuracy (%) | 78.28 | — | Unverified |
| 5 | resnet8x4 (T: resnet32x4 S: resnet8x4 [modified]) | Top-1 Accuracy (%) | 78.08 | — | Unverified |
| 6 | ReviewKD++(T:resnet-32x4, S:shufflenet-v2) | Top-1 Accuracy (%) | 77.93 | — | Unverified |
| 7 | ReviewKD++(T:resnet-32x4, S:shufflenet-v1) | Top-1 Accuracy (%) | 77.68 | — | Unverified |
| 8 | resnet8x4 (T: resnet32x4 S: resnet8x4) | Top-1 Accuracy (%) | 77.5 | — | Unverified |
| 9 | resnet8x4 (T: resnet32x4 S: resnet8x4) | Top-1 Accuracy (%) | 76.68 | — | Unverified |
| 10 | resnet8x4 (T: resnet32x4 S: resnet8x4) | Top-1 Accuracy (%) | 76.31 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LSHFM (T: ResNet101 S: ResNet50) | mAP | 77.16 | — | Unverified |
| 2 | LSHFM (T: ResNet101 S: MobileNetV2) | mAP | 73.73 | — | Unverified |
| 3 | ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small) | box AP | 47.6 | — | Unverified |
| 4 | ADLIK-Mask (T: Mask R-CNN vit-base S: Mask R-CNN deit-small) | mask AP | 42.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50)) | AP@0.5 | 61.8 | — | Unverified |
| 2 | ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet18)) | AP@0.5 | 57.96 | — | Unverified |
| 3 | ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(mobilenet-v2)) | AP@0.5 | 55.18 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LSHFM (T: ResNet101 S: ResNet50) | mAP | 93.17 | — | Unverified |
| 2 | LSHFM (T: ResNet101 S: MobileNetV2) | mAP | 90.14 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | TIE-KD (T: Adabins S: MobileNetV2) | RMSE | 2.43 | — | Unverified |