Adaptive Decoupled Pose Knowledge Distillation
Jie Xu, Shanshan Zhang, and Jian Yang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/superjay1996/adpdOfficialIn paperpytorch★ 0
Abstract
Existing state-of-the-art human pose estimation approaches require heavy computational resources for accurate prediction. One promising technique to obtain an accurate yet lightweight pose estimator is Knowledge Distillation (KD), which distills the pose knowledge from a powerful teacher model to a lightweight student model. However, existing human pose KD methods focus more on designing paired student and teacher network architectures, yet ignore the mechanism of pose knowledge distillation. In this work, we reformulate the human pose KD to a coarse to fine process and decouple the classical KD loss into three terms: Binary Keypoint vs. Non-Keypoint Distillation (BiKD), Keypoint Area Distillation (KAD) and Non-keypoint Area Distillation (NAD). Observing the decoupled formulation, we point out an important limitation of the classical pose KD, i.e. the bias between different loss terms limits the performance gain of the student network. To address the biased knowledge distillation problem, we present a novel KD method named Adaptive Decoupled Pose knowledge Distillation (ADPD), enabling BiKD, KAD and NAD to play their roles more effectively and flexibly. Extensive experiments on two standard human pose datasets, MPII and MS COCO, demonstrate that our proposed method outperforms previous KD methods and is generalizable to different teacher-student pairs. The code will be available at https://github.com/SuperJay1996/ADPD.