Exploring GLU Expansion Ratios: A Study of Structured Pruning in LLaMA-3.2 Models

2024-12-26OSF Preprints 2024Code Available5· sign in to hype

Pere Martra.

Code Available — Be the first to reproduce this paper.

Code

github.com/peremartra/Large-Language-Model-Notebooks-Course
none★ 1,778

Abstract

Large language models with GLU architectures are typically designed with significant expansion ratios in their MLP layers, where output dimensions are several times larger than input dimensions. While various pruning techniques have been proposed to reduce model size, the relationship between this expansion capacity and model performance has remained unexplored. This paper presents a systematic investigation of GLU expansion ratios as a key metric for model pruning, using Llama-3.2 models (1B and 3B variants) as case studies. Our findings reveal that models with an expansion ratio of 140% consistently outperform others, achieving a balance between redundancy reduction, task-specific performance, and environmental sustainability. For instance, Llama-3.2-1B at 40% pruning and Llama-3.2-3B at 10% pruning surpassed their respective baselines in multiple benchmarks, including BoolQ, IFEval, and MUSR. Beyond performance, this study underscores significant environmental benefits. Specifically, pruning the 3B model to 140% expansion with just 10% pruning achieved a remarkable 50% reduction in CO2 emissions, showcasing the potential of pruning to enhance computational efficiency while maintaining robust performance. Future research should explore extending these findings to other GLU-based architectures, such as Mistral, Qwen, and Microsoft Phi, to validate their broader applicability. This study provides a novel perspective on sustainable AI development, effectively bridging the goals of performance optimization and environmental responsibility.

Tasks

Computational Efficiency Network Pruning

Exploring GLU Expansion Ratios: A Study of Structured Pruning in LLaMA-3.2 Models

Code

Abstract

Tasks

Reproductions