SOTAVerified

Knowledge Distillation

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Papers

Showing 14511500 of 4240 papers

TitleStatusHype
Enhancing Scalability in Recommender Systems through Lottery Ticket Hypothesis and Knowledge Distillation-based Neural Network Pruning0
Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition0
Enhancing Romanian Offensive Language Detection through Knowledge Distillation, Multi-Task Learning, and Data Augmentation0
Enhancing Review Comprehension with Domain-Specific Commonsense0
Enhancing Once-For-All: A Study on Parallel Blocks, Skip Connections and Early Exits0
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders0
Experimentation in Content Moderation using RWKV0
Experimenting with Knowledge Distillation techniques for performing Brain Tumor Segmentation0
Explainability-Driven Leaf Disease Classification Using Adversarial Training and Knowledge Distillation0
Explainable Knowledge Distillation for On-device Chest X-Ray Classification0
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning0
Explaining Knowledge Distillation by Quantifying the Knowledge0
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval0
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation0
A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks0
Explicit Connection Distillation0
A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
FlyKD: Graph Knowledge Distillation on the Fly with Curriculum Learning0
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR0
Enhancing Modality-Agnostic Representations via Meta-Learning for Brain Tumor Segmentation0
Enhancing Mapless Trajectory Prediction through Knowledge Distillation0
Exploring compressibility of transformer based text-to-music (TTM) models0
Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch0
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices0
Compression of Deep Learning Models for Text: A Survey0
Generalized Supervised Contrastive Learning0
Exploring Extreme Quantization in Spiking Language Models0
Compression of Acoustic Event Detection Models With Quantized Distillation0
Continual Learning for Class- and Domain-Incremental Semantic Segmentation0
FLAR: A Unified Prototype Framework for Few-Sample Lifelong Active Recognition0
For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction0
Compressing Visual-linguistic Model via Knowledge Distillation0
Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation0
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models0
A Theoretical Analysis of Soft-Label vs Hard-Label Training in Neural Networks0
Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models0
Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation0
Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection0
Unsupervised Continual Learning Via Pseudo Labels0
Continual Learning with Diffusion-based Generative Replay for Industrial Streaming Data0
A Note on Knowledge Distillation Loss Function for Object Classification0
Continual Learning with Dirichlet Generative-based Rehearsal0
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT0
Extending Label Smoothing Regularization with Self-Knowledge Distillation0
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation0
Extracting knowledge from features with multilevel abstraction0
Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment0
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation0
Enhancing CTC-Based Visual Speech Recognition0
Show:102550
← PrevPage 30 of 85Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ScaleKD (T:BEiT-L S:ViT-B/14)Top-1 accuracy %86.43Unverified
2ScaleKD (T:Swin-L S:ViT-B/16)Top-1 accuracy %85.53Unverified
3ScaleKD (T:Swin-L S:ViT-S/16)Top-1 accuracy %83.93Unverified
4ScaleKD (T:Swin-L S:Swin-T)Top-1 accuracy %83.8Unverified
5KD++(T: regnety-16GF S:ViT-B)Top-1 accuracy %83.6Unverified
6VkD (T:RegNety 160 S:DeiT-S)Top-1 accuracy %82.9Unverified
7SpectralKD (T:Swin-S S:Swin-T)Top-1 accuracy %82.7Unverified
8ScaleKD (T:Swin-L S:ResNet-50)Top-1 accuracy %82.55Unverified
9DiffKD (T:Swin-L S: Swin-T)Top-1 accuracy %82.5Unverified
10DIST (T: Swin-L S: Swin-T)Top-1 accuracy %82.3Unverified
#ModelMetricClaimedVerifiedStatus
1SRD (T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)79.86Unverified
2shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)78.76Unverified
3MV-MR (T: CLIP/ViT-B-16 S: resnet50)Top-1 Accuracy (%)78.6Unverified
4resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)78.28Unverified
5resnet8x4 (T: resnet32x4 S: resnet8x4 [modified])Top-1 Accuracy (%)78.08Unverified
6ReviewKD++(T:resnet-32x4, S:shufflenet-v2)Top-1 Accuracy (%)77.93Unverified
7ReviewKD++(T:resnet-32x4, S:shufflenet-v1)Top-1 Accuracy (%)77.68Unverified
8resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)77.5Unverified
9resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.68Unverified
10resnet8x4 (T: resnet32x4 S: resnet8x4)Top-1 Accuracy (%)76.31Unverified
#ModelMetricClaimedVerifiedStatus
1LSHFM (T: ResNet101 S: ResNet50)mAP93.17Unverified
2LSHFM (T: ResNet101 S: MobileNetV2)mAP90.14Unverified
#ModelMetricClaimedVerifiedStatus
1TIE-KD (T: Adabins S: MobileNetV2)RMSE2.43Unverified