An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2

2024-11-15Code Available0· sign in to hype

Pepijn de Reus, Ana Oprescu, Jelle Zuidema

Code Available — Be the first to reproduce this paper.

Code

github.com/ana-oprescu/greenllms
OfficialIn papernone★ 0

Abstract

This study examines quantisation and pruning strategies to reduce energy consumption in code Large Language Models (LLMs) inference. Using StarCoder2, we observe increased energy demands with quantization due to lower throughput and some accuracy losses. Conversely, pruning reduces energy usage but impairs performance. The results highlight challenges and trade-offs in LLM model compression. We suggest future work on hardware-optimized quantization to enhance efficiency with minimal loss in accuracy.

Tasks

Model Compression Quantization

An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2

Code

Abstract

Tasks

Reproductions