Improving Representational Continuity via Continued Pretraining
Michael Sun, Ananya Kumar, Divyam Madaan, Percy Liang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/shiningsunnyday/uclOfficialIn paperpytorch★ 1
Abstract
We consider the continual representation learning setting: sequentially pretrain a model M' on tasks T_1, , T_T, and then adapt M' on a small amount of data from each task T_i to check if it has forgotten information from old tasks. Under a kNN adaptation protocol, prior work shows that continual learning methods improve forgetting over naive training (SGD). In reality, practitioners do not use kNN classifiers -- they use the adaptation method that works best (e.g., fine-tuning) -- here, we find that strong continual learning baselines do worse than naive training. Interestingly, we find that a method from the transfer learning community (LP-FT) outperforms naive training and the other continual learning methods. Even with standard kNN evaluation protocols, LP-FT performs comparably with strong continual learning methods (while being simpler and requiring less memory) on three standard benchmarks: sequential CIFAR-10, CIFAR-100, and TinyImageNet. LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW), and a variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark.