SOTAVerified

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

2024-11-11Code Available1· sign in to hype

Runming Yang, Taiqiang Wu, Jiahao Wang, Pengfei Hu, Ngai Wong, Yujiu Yang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge from a large language model (LLM) teacher to a compact student. Initially, we revisit the knowledge distillation (KD) and low-rank adaption (LoRA), and argue that they share the same paradigm. Inspired by this observation, we explore the strategy that combines LoRA and KD to enhance the efficiency of knowledge transfer. We first summarize some guidelines for this design and further develop the LLM-Neo. Experimental results on compressing Llama 2 and Llama 3 show that LLM-Neo outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-Neo on variants of LoRA. The trained models have been available at this repository.

Tasks

Reproductions