Language Modelling
A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.
Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.
Source: Wikipedia
Papers
Showing 1–10 of 17610 papers
All datasetsWikiText-103Penn Treebank (Word Level)enwik8The PileWikiText-2LAMBADAOne Billion WordText8Penn Treebank (Character Level)Hutter PrizeOpenWebTextSALMon
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Decay RNN | Validation perplexity | 76.67 | — | Unverified |
| 2 | GRU | Validation perplexity | 53.78 | — | Unverified |
| 3 | LSTM | Validation perplexity | 52.73 | — | Unverified |
| 4 | LSTM | Test perplexity | 48.7 | — | Unverified |
| 5 | Temporal CNN | Test perplexity | 45.2 | — | Unverified |
| 6 | TCN | Test perplexity | 45.19 | — | Unverified |
| 7 | GCNN-8 | Test perplexity | 44.9 | — | Unverified |
| 8 | Neural cache model (size = 100) | Test perplexity | 44.8 | — | Unverified |
| 9 | Neural cache model (size = 2,000) | Test perplexity | 40.8 | — | Unverified |
| 10 | GPT-2 Small | Test perplexity | 37.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | TCN | Test perplexity | 108.47 | — | Unverified |
| 2 | Seq-U-Net | Test perplexity | 107.95 | — | Unverified |
| 3 | GRU (Bai et al., 2018) | Test perplexity | 92.48 | — | Unverified |
| 4 | R-Transformer | Test perplexity | 84.38 | — | Unverified |
| 5 | Zaremba et al. (2014) - LSTM (medium) | Test perplexity | 82.7 | — | Unverified |
| 6 | Gal & Ghahramani (2016) - Variational LSTM (medium) | Test perplexity | 79.7 | — | Unverified |
| 7 | LSTM (Bai et al., 2018) | Test perplexity | 78.93 | — | Unverified |
| 8 | Zaremba et al. (2014) - LSTM (large) | Test perplexity | 78.4 | — | Unverified |
| 9 | Gal & Ghahramani (2016) - Variational LSTM (large) | Test perplexity | 75.2 | — | Unverified |
| 10 | Inan et al. (2016) - Variational RHN | Test perplexity | 66 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LSTM (7 layers) | Bit per Character (BPC) | 1.67 | — | Unverified |
| 2 | Hypernetworks | Bit per Character (BPC) | 1.34 | — | Unverified |
| 3 | SHA-LSTM (4 layers, h=1024, no attention head) | Bit per Character (BPC) | 1.33 | — | Unverified |
| 4 | LN HM-LSTM | Bit per Character (BPC) | 1.32 | — | Unverified |
| 5 | ByteNet | Bit per Character (BPC) | 1.31 | — | Unverified |
| 6 | Recurrent Highway Networks | Bit per Character (BPC) | 1.27 | — | Unverified |
| 7 | Large FS-LSTM-4 | Bit per Character (BPC) | 1.25 | — | Unverified |
| 8 | Large mLSTM | Bit per Character (BPC) | 1.24 | — | Unverified |
| 9 | AWD-LSTM (3 layers) | Bit per Character (BPC) | 1.23 | — | Unverified |
| 10 | Cluster-Former (#C=512) | Bit per Character (BPC) | 1.22 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Smaller Transformer 126M (pre-trained) | Test perplexity | 33 | — | Unverified |
| 2 | OPT 125M | Test perplexity | 32.26 | — | Unverified |
| 3 | Larger Transformer 771M (pre-trained) | Test perplexity | 28.1 | — | Unverified |
| 4 | OPT 1.3B | Test perplexity | 19.55 | — | Unverified |
| 5 | GPT-Neo 125M | Test perplexity | 17.83 | — | Unverified |
| 6 | OPT 2.7B | Test perplexity | 17.81 | — | Unverified |
| 7 | Smaller Transformer 126M (fine-tuned) | Test perplexity | 12 | — | Unverified |
| 8 | GPT-Neo 1.3B | Test perplexity | 11.46 | — | Unverified |
| 9 | Transformer 125M | Test perplexity | 10.7 | — | Unverified |
| 10 | GPT-Neo 2.7B | Test perplexity | 10.44 | — | Unverified |