Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish

2022-06-01LREC 2022Unverified0· sign in to hype

Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present GTP-SW3, a 3.5 billion parameter autoregressive language model, trained on a newly created 100 GB Swedish corpus. This paper provides insights with regards to data collection and training, while highlights the challenges of proper model evaluation. The results of quantitive evaluation through perplexity indicate that GPT-SW3 is a competent model in comparison with existing autoregressive models of similar size. Additionally, we perform an extensive prompting study which reveals the good text generation capabilities of GTP-SW3.

Tasks

Language Modeling Language Modelling Text Generation

Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish

Abstract

Tasks

Reproductions