Language Modelling
A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.
Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.
Source: Wikipedia
Papers
Showing 1–10 of 17610 papers
All datasetsWikiText-103Penn Treebank (Word Level)enwik8The PileWikiText-2LAMBADAOne Billion WordText8Penn Treebank (Character Level)Hutter PrizeOpenWebTextSALMon
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PaLM-540B (Few-Shot) | Accuracy | 89.7 | — | Unverified |
| 2 | PaLM 2-L (one-shot) | Accuracy | 86.9 | — | Unverified |
| 3 | GPT-3 175B (Few-Shot) | Accuracy | 86.4 | — | Unverified |
| 4 | LLaMA-65B+CFG (Zero-Shot) | Accuracy | 84 | — | Unverified |
| 5 | LLaMA-30B+CFG (zero-shot) | Accuracy | 83.9 | — | Unverified |
| 6 | PaLM 2-M (one-shot) | Accuracy | 83.7 | — | Unverified |
| 7 | Cohere Large | Accuracy | 82.33 | — | Unverified |
| 8 | LLaMA-13B+CFG (zero-shot) | Accuracy | 82.2 | — | Unverified |
| 9 | PaLM-540B (One-Shot) | Accuracy | 81.8 | — | Unverified |
| 10 | GLaM 62B/64E (One-Shot) | Accuracy | 80.9 | — | Unverified |