LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

2024-11-11Code Available1· sign in to hype

Runming Yang, Taiqiang Wu, Jiahao Wang, Pengfei Hu, Ngai Wong, Yujiu Yang

Code Available — Be the first to reproduce this paper.

Code

github.com/wutaiqiang/shadow-ft
pytorch★ 42
github.com/yang3121099/LLM-Neo
pytorch★ 15

Abstract

In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge from a large language model (LLM) teacher to a compact student. Initially, we revisit the knowledge distillation (KD) and low-rank adaption (LoRA), and argue that they share the same paradigm. Inspired by this observation, we explore the strategy that combines LoRA and KD to enhance the efficiency of knowledge transfer. We first summarize some guidelines for this design and further develop the LLM-Neo. Experimental results on compressing Llama 2 and Llama 3 show that LLM-Neo outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-Neo on variants of LoRA. The trained models have been available at this repository.

Tasks

Knowledge Distillation Language Modeling Language Modelling Large Language Model Transfer Learning

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

Code

Abstract

Tasks

Reproductions