| Training Compute-Optimal Large Language Models | Mar 29, 2022 | AnachronismsAnalogical Similarity | CodeCode Available | 6 | 5 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 | 5 |
| Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Jul 17, 2024 | GPULAMBADA | CodeCode Available | 2 | 5 |
| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | Sep 17, 2019 | GPULAMBADA | CodeCode Available | 2 | 5 |
| Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | Oct 28, 2024 | Automated Theorem ProvingCode Generation | CodeCode Available | 1 | 5 |
| Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences | Apr 6, 2020 | LAMBADALanguage Modelling | CodeCode Available | 1 | 5 |
| Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM | Nov 3, 2024 | LAMBADAText Generation | CodeCode Available | 1 | 5 |
| The LAMBADA dataset: Word prediction requiring a broad discourse context | Jun 20, 2016 | LAMBADASentence | CodeCode Available | 1 | 5 |
| Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time | Dec 1, 2019 | LAMBADALanguage Modeling | CodeCode Available | 0 | 5 |
| Entity Tracking Improves Cloze-style Reading Comprehension | Oct 5, 2018 | LAMBADAReading Comprehension | CodeCode Available | 0 | 5 |