SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

2024-08-21Code Available1· sign in to hype

Yang Cao

Code Available — Be the first to reproduce this paper.

Code

github.com/Gunale0926/SORSA
OfficialIn paperpytorch★ 38

Abstract

In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. Each SORSA adapter consists of two main parts: trainable principal singular weights W_p = U_p diag(S_p) V^_p, and frozen residual weights W_r = U_r diag(S_r) V^_r. These parts are initialized by performing singular value decomposition (SVD) on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of W_p and make the optimization more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. We also introduce a method to analyze the variation of the parameters by performing SVD and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. After all, SORSA shows a faster convergence than LoRA and PiSSA in our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), AdaLoRA (47.30%), Full FT (49.05%), and PiSSA (53.07%). On the MATH benchmark, SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), AdaLoRA (6.48%), Full FT (7.22%), and PiSSA (7.44%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance.

Tasks

8k GSM8K Math parameter-efficient fine-tuning

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Code

Abstract

Tasks

Reproductions