How to Train BERT with an Academic Budget

2021-04-15EMNLP 2021Code Available1· sign in to hype

Peter Izsak, Moshe Berchansky, Omer Levy

Code Available — Be the first to reproduce this paper.

Code

github.com/IntelLabs/academic-budget-bert
OfficialIn paperpytorch★ 0
github.com/peteriz/academic-budget-bert
OfficialIn paperpytorch★ 0
github.com/octanove/shiba
pytorch★ 89
github.com/yxzwang/normalized-information-payload
pytorch★ 9

Abstract

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

Tasks

Language Modeling Language Modelling Linguistic Acceptability Natural Language Inference Question Answering Semantic Textual Similarity Sentiment Analysis

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CoLA	24hBERT	Accuracy	57.1	—	Unverified

How to Train BERT with an Academic Budget

Code

Abstract

Tasks

Benchmark Results

Reproductions