SOTAVerified

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

2024-12-26Code Available0· sign in to hype

Huiyuan Tian, Bonan Xu, Shijian Li, Gang Pan

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Knowledge Distillation (KD) has achieved widespread success in compressing large Vision Transformers (ViTs), but a unified theoretical framework for both ViTs and KD is still lacking. In this paper, we propose SpectralKD, a novel unified analytical framework that offers deeper insights into ViTs and optimizes KD via spectral analysis. Our model-wise analysis reveals that CaiT concentrates information in their first and last few layers, informing optimal layer selection for KD. Surprisingly, our layer-wise analysis discovers that Swin Transformer and CaiT exhibit similar spectral encoding patterns despite their architectural differences, leading to feature map alignment guideline. Building on these insights, we propose a simple yet effective spectral alignment method for KD. Benefiting from the deeper understanding by above analysis results, even such a simple strategy achieves state-of-the-art performance on ImageNet-1K without introducing any trainable parameters, improving DeiT-Tiny by +5.2\% and Swin-Tiny by +1.4\% in top-1 accuracy. Furthermore, our post-training analysis reveals that distilled students can reproduce spectral patterns similar to their teachers, opening a new area we term ``distillation dynamics". Code and experimental logs are available in https://github.com/thy960112/SpectralKD.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
ImageNetSpectralKD (T:Swin-S S:Swin-T)Top-1 accuracy %82.7Unverified
ImageNetSpectralKD (T:Cait-S24 S:DeiT-S)Top-1 accuracy %82.2Unverified
ImageNetSpectralKD (T:Cait-S24 S:DeiT-T)Top-1 accuracy %77.4Unverified

Reproductions