Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher

2021-01-01ICCV 2021Unverified0· sign in to hype

Yichen Zhu, Yi Wang

Unverified — Be the first to reproduce this paper.

Abstract

Knowledge distillation (KD) transfers the dark knowledge from cumbersome networks (teacher) to lightweight (student) networks and expects the student to achieve more promising performance than training without the teacher's knowledge. However, a counter-intuitive argument is that better teachers do not make better students due to the capacity mismatch. To this end, we present a novel adaptive knowledge distillation method to complement traditional approaches. The proposed method, named as Student Customized Knowledge Distillation (SCKD), examines the capacity mismatch between teacher and student from the perspective of gradient similarity. We formulate the knowledge distillation as a multi-task learning problem so that the teacher transfers knowledge to the student only if the student can benefit from learning such knowledge. We validate our methods on multiple datasets with various teacher-student configurations on image classification, object detection, and semantic segmentation.

Tasks

image-classification Image Classification Knowledge Distillation Multi-Task Learning object-detection Object Detection Semantic Segmentation

Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher

Abstract

Tasks

Reproductions