SOTAVerified

Improving Representational Continuity via Continued Pretraining

2023-02-26Code Available0· sign in to hype

Michael Sun, Ananya Kumar, Divyam Madaan, Percy Liang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We consider the continual representation learning setting: sequentially pretrain a model M' on tasks T_1, , T_T, and then adapt M' on a small amount of data from each task T_i to check if it has forgotten information from old tasks. Under a kNN adaptation protocol, prior work shows that continual learning methods improve forgetting over naive training (SGD). In reality, practitioners do not use kNN classifiers -- they use the adaptation method that works best (e.g., fine-tuning) -- here, we find that strong continual learning baselines do worse than naive training. Interestingly, we find that a method from the transfer learning community (LP-FT) outperforms naive training and the other continual learning methods. Even with standard kNN evaluation protocols, LP-FT performs comparably with strong continual learning methods (while being simpler and requiring less memory) on three standard benchmarks: sequential CIFAR-10, CIFAR-100, and TinyImageNet. LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW), and a variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark.

Tasks

Reproductions