Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

2024-05-24Unverified0· sign in to hype

Khanh-Binh Nguyen, Chae Jung Park

Unverified — Be the first to reproduce this paper.

Abstract

Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose Retro, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to 66.9\%, 69.3\%, and 69.8\%, respectively, with significantly fewer parameters.

Tasks

Self-Supervised Learning

Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

Abstract

Tasks

Reproductions