Knowledge distillation via softmax regression representation learning

2021-01-01ICLR 2021Unverified0· sign in to hype

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Unverified — Be the first to reproduce this paper.

Abstract

This paper addresses the problem of model compression via knowledge distillation. We advocate for a method that optimizes the output feature of the penultimate layer of the student network and hence is directly related to representation learning. Previous distillation methods which typically impose direct feature matching between the student and the teacher do not take into account the classification problem at hand. On the contrary, our distillation method decouples representation learning and classification and utilizes the teacher's pre-trained classifier to train the student's penultimate layer feature. In particular, for the same input image, we wish the teacher's and student's feature to produce the same output when passed through the teacher's classifier which is achieved with a simple L_2 loss. Our method is extremely simple to implement and straightforward to train and is shown to consistently outperform previous state-of-the-art methods over a large set of experimental settings including different (a) network architectures, (b) teacher-student capacities, (c) datasets, and (d) domains.

Tasks

Knowledge Distillation Model Compression regression Representation Learning

Knowledge distillation via softmax regression representation learning

Abstract

Tasks

Reproductions