Balancing LoRA Performance and Efficiency with Simple Shard Sharing

2024-09-19Code Available2· sign in to hype

Jiale Kang, Qingyu Yin

Code Available — Be the first to reproduce this paper.

Code

github.com/jl-er/disha
OfficialIn paperpytorch★ 34
github.com/jl-er/bone
OfficialIn paperpytorch★ 34
github.com/jl-er/rwkv-peft
pytorch★ 179

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), effectively reduce the number of trainable parameters in Large Language Models (LLMs). However, as model scales continue to grow, the demand for computational resources remains a significant challenge. Existing LoRA variants often struggle to strike an optimal balance between adaptability (model performance and convergence speed) and efficiency (computational overhead, memory usage, and initialization time). This paper introduces FOSSIL(Framework for Optimal Shard Sharing Integration in LoRA), a novel PEFT approach that addresses this trade-off through a simple shard-sharing mechanism. FOSSIL leverages the insight that a low-rank adaptation can be achieved by decomposing the weight matrix into multiple fragment matrices and utilizing a shared, trainable common fragment. This method constructs the low-rank update matrix through the replication of these shared, partitioned shards. We also propose a hardware-efficient and broadly applicable implementation for FOSSIL. Extensive experiments conducted on a range of tasks, alongside a systematic analysis of computational performance, demonstrate FOSSIL's superiority. The results show that FOSSIL significantly outperforms standard LoRA and its prominent variants in both model performance metrics and computational efficiency, including initialization speed and training throughput. By effectively balancing expressive power and resource utilization, FOSSIL offers a compelling solution for efficiently adapting large-scale models.

Tasks

Computational Efficiency GSM8K Math Natural Language Understanding parameter-efficient fine-tuning Text Generation

Balancing LoRA Performance and Efficiency with Simple Shard Sharing

Code

Abstract

Tasks

Reproductions