SOTAVerified

A second-order-like optimizer with adaptive gradient scaling for deep learning

2024-10-08Code Available0· sign in to hype

Jérôme Bolte, Ryan Boustany, Edouard Pauwels, Andrei Purica

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum. After giving geometrical insights, we evaluate INNAprop on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT, and on GPT-2 (OpenWebText) train from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at https://github.com/innaprop/innaprop.

Tasks

Reproductions