A second-order-like optimizer with adaptive gradient scaling for deep learning

2024-10-08Code Available0· sign in to hype

Jérôme Bolte, Ryan Boustany, Edouard Pauwels, Andrei Purica

Code Available — Be the first to reproduce this paper.

Code

github.com/innaprop/innaprop
OfficialIn paperpytorch★ 8

Abstract

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum. After giving geometrical insights, we evaluate INNAprop on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT, and on GPT-2 (OpenWebText) train from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at https://github.com/innaprop/innaprop.

Tasks

Image Classification Language Modelling

A second-order-like optimizer with adaptive gradient scaling for deep learning

Code

Abstract

Tasks

Reproductions