Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge

2021-06-02Unverified0· sign in to hype

Ziyun Li, Xinshao Wang, Di Hu, Neil M. Robertson, David A. Clifton, Christoph Meinel, Haojin Yang

Unverified — Be the first to reproduce this paper.

Abstract

Mutual knowledge distillation (MKD) improves a model by distilling knowledge from another model. However, not all knowledge is certain and correct, especially under adverse conditions. For example, label noise usually leads to less reliable models due to undesired memorization zhang2017understanding,arpit2017closer. Wrong knowledge misleads the learning rather than helps. This problem can be handled by two aspects: (i) improving the reliability of a model where the knowledge is from (i.e., knowledge source's reliability); (ii) selecting reliable knowledge for distillation. In the literature, making a model more reliable is widely studied while selective MKD receives little attention. Therefore, we focus on studying selective MKD. Concretely, a generic MKD framework, Confident knowledge selection followed by Mutual Distillation (CMD), is designed. The key component of CMD is a generic knowledge selection formulation, making the selection threshold either static (CMD-S) or progressive (CMD-P). Additionally, CMD covers two special cases: zero-knowledge and all knowledge, leading to a unified MKD framework. Extensive experiments are present to demonstrate the effectiveness of CMD and thoroughly justify the design of CMD. For example, CMD-P obtains new state-of-the-art results in robustness against label noise.

Tasks

All Knowledge Distillation Memorization

Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge

Abstract

Tasks

Reproductions