EA-KD: Entropy-based Adaptive Knowledge Distillation

2023-11-22Unverified0· sign in to hype

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu, Lei Zhao, Zhuangzhuang Chen, Shin-Jye Lee

Unverified — Be the first to reproduce this paper.

Abstract

Knowledge distillation (KD) enables a smaller "student" model to mimic a larger "teacher" model by transferring knowledge from the teacher's output or features. However, most KD methods treat all samples uniformly, overlooking the varying learning value of each sample and thereby limiting effectiveness. In this paper, we propose Entropy-based Adaptive Knowledge Distillation (EA-KD), a simple yet effective plug-and-play KD method that prioritizes learning from valuable samples. EA-KD quantifies each sample's learning value by strategically combining the entropy of the teacher and student output, then dynamically reweights the distillation loss to place greater emphasis on high-value samples. Extensive experiments across diverse KD frameworks and tasksx2014including image classification, object detection, and large language model (LLM) distillationx2014demonstrate that EA-KD consistently enhances performance, achieving state-of-the-art results with negligible computational cost. Our code will be publicly available.

Tasks

image-classification Image Classification Knowledge Distillation Language Modeling Language Modelling Large Language Model object-detection Object Detection Transfer Learning

EA-KD: Entropy-based Adaptive Knowledge Distillation

Abstract

Tasks

Reproductions