Language Modelling
A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.
Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.
Source: Wikipedia
Papers
Showing 1–10 of 17610 papers
All datasetsWikiText-103Penn Treebank (Word Level)enwik8The PileWikiText-2LAMBADAOne Billion WordText8Penn Treebank (Character Level)Hutter PrizeOpenWebTextSALMon
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | OPT-175B (50% Sparsity) | Test perplexity | 234.77 | — | Unverified |
| 2 | Grave et al. (2016) - LSTM | Test perplexity | 99.3 | — | Unverified |
| 3 | Inan et al. (2016) - Variational LSTM (tied) (h=650) | Test perplexity | 87.7 | — | Unverified |
| 4 | Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss | Test perplexity | 87 | — | Unverified |
| 5 | Grave et al. (2016) - LSTM + continuous cache pointer | Test perplexity | 68.9 | — | Unverified |
| 6 | EGRU | Test perplexity | 68.9 | — | Unverified |
| 7 | Melis et al. (2017) - 1-layer LSTM (tied) | Test perplexity | 65.9 | — | Unverified |
| 8 | AWD-LSTM | Test perplexity | 65.8 | — | Unverified |
| 9 | AWD-LSTM + ATOI | Test perplexity | 64.73 | — | Unverified |
| 10 | AWD-LSTM 3-layer with Fraternal dropout | Test perplexity | 64.1 | — | Unverified |