LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/microsoft/LoRAOfficialIn paperjax★ 13,360
- github.com/hiyouga/llama-efficient-tuningpytorch★ 68,894
- github.com/labmlai/annotated_deep_learning_paper_implementationspytorch★ 66,103
- github.com/tatsu-lab/stanford_alpacapytorch★ 30,255
- github.com/QwenLM/Qwen-7Bpytorch★ 20,802
- github.com/qwenlm/qwenpytorch★ 20,797
- github.com/tloen/alpaca-lorapytorch★ 18,960
- github.com/flagalpha/llama2-chinesepytorch★ 14,741
- github.com/llamafamily/llama-chinesepytorch★ 14,739
- github.com/mistralai/mistral-inferencepytorch★ 10,731
Abstract
An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| AMC23 | Math-Master | Acc | 82 | — | Unverified |