Hyperparameter Power Impact in Transformer Language Model Training

2021-11-01EMNLP (sustainlp) 2021Code Available0· sign in to hype

Lucas Høyberg Puvis de Chavannes, Mads Guldborg Kjeldgaard Kongsbak, Timmie Rantzau, Leon Derczynski

Code Available — Be the first to reproduce this paper.

Code

github.com/StrombergNLP/Low-Carbon-NLP
Officialpytorch★ 5

Abstract

Training large language models can consume a large amount of energy. We hypothesize that the language model’s configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To investigate these claims, we introduce a power consumption factor to the objective function, and explore the range of models and hyperparameter configurations that affect power. We identify multiple configuration factors that can reduce power consumption during language model training while retaining model quality.

Tasks

Language Modeling Language Modelling model

Hyperparameter Power Impact in Transformer Language Model Training

Code

Abstract

Tasks

Reproductions