Saving 77% of the Parameters in Large Language Models Technical Report

2025-02-09researchgate.net 2025Code Available2· sign in to hype

Joao Paulo Schwarz Schuler, Alejandra Rojas-Gómez

Code Available — Be the first to reproduce this paper.

Code

github.com/joaopauloschuler/less-parameters-llm
pytorch★ 56

Abstract

This technical report demonstrates that large language models (LLMs) can maintain their learning capacity while reducing their non-embedding parameters by up to 77%. We achieve this by adapting a parameter reduction technique originally developed for computer vision, replacing dense layers with an optimized subnetwork that contains grouped pointwise convolutions. Using Microsoft's phi-3-mini-4k-instruct as our baseline, we show that our optimized model (kphi-3) achieves comparable validation loss while using only 15-23% of the original non-embedding parameters. All experiments were conducted on a single NVIDIA L2 GPU within a 3-day timeframe, supporting the democratization of AI research. Our findings suggest that current LLM architectures may be substantially overparameterized, opening possibilities for more efficient model training and deployment.

Tasks

GPU Text Generation

Saving 77% of the Parameters in Large Language Models Technical Report

Code

Abstract

Tasks

Reproductions