SOTAVerified

NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation

2025-06-10Code Available0· sign in to hype

Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NysAct, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NysAct leverages an eigenvalue-shifted Nystrom method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NysAct not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods. Code is available at https://github.com/hseung88/nysact.

Tasks

Reproductions