Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data
Sahar Almahfouz Nasser, Nihar Gupte, Amit Sethi
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/SaharAlmahfouzNasser/MeDAL-RetinaOfficialnone★ 11
Abstract
Retinal image matching plays a crucial role in monitoring disease progression and treatment response. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based model. We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. Firstly, we propose architectural modifications to a CNN-based semi-supervised method called SuperRetina that help us improve its results on a publicly available dataset. Then, we train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model, which is counter-intuitive in the field knowledge-distillation research where training lighter models based on heavier ones is the norm. Surprisingly, such reverse knowledge distillation improves generalization even further. Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output. We also provide a public dataset with annotations for retinal image keypoint detection and matching to help the research community develop algorithms for retinal image applications.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| FIRE | LKRetina | mAUC | 0.76 | — | Unverified |