| Training Compute-Optimal Large Language Models | Mar 29, 2022 | AnachronismsAnalogical Similarity | CodeCode Available | 6 |
| Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Jul 17, 2024 | GPULAMBADA | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | Sep 17, 2019 | GPULAMBADA | CodeCode Available | 2 |
| Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM | Nov 3, 2024 | LAMBADAText Generation | CodeCode Available | 1 |
| Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | Oct 28, 2024 | Automated Theorem ProvingCode Generation | CodeCode Available | 1 |
| Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences | Apr 6, 2020 | LAMBADALanguage Modelling | CodeCode Available | 1 |
| The LAMBADA dataset: Word prediction requiring a broad discourse context | Jun 20, 2016 | LAMBADASentence | CodeCode Available | 1 |
| Matryoshka Model Learning for Improved Elastic Student Models | May 29, 2025 | LAMBADAMath | —Unverified | 0 |
| AdaGC: Improving Training Stability for Large Language Model Pretraining | Feb 16, 2025 | LAMBADALanguage Modeling | —Unverified | 0 |
| SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning | Feb 20, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| PIXAR: Auto-Regressive Language Modeling in Pixel Space | Jan 6, 2024 | DecoderLAMBADA | —Unverified | 0 |
| Concise and Organized Perception Facilitates Reasoning in Large Language Models | Oct 5, 2023 | LAMBADAMath | —Unverified | 0 |
| Headless Language Models: Learning without Predicting with Contrastive Weight Tying | Sep 15, 2023 | LAMBADA | —Unverified | 0 |
| Stay on topic with Classifier-Free Guidance | Jun 30, 2023 | Code GenerationCommon Sense Reasoning | —Unverified | 0 |
| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 |
| LAMBADA: Backward Chaining for Automated Reasoning in Natural Language | Dec 20, 2022 | LAMBADALogical Reasoning | —Unverified | 0 |
| Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling | May 1, 2022 | LAMBADALearning to Execute | —Unverified | 0 |
| CoreLM: Coreference-aware Language Model Fine-Tuning | Nov 4, 2021 | LAMBADALanguage Modeling | —Unverified | 0 |
| The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models | Aug 13, 2021 | LAMBADAText Generation | CodeCode Available | 0 |
| E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks | Nov 10, 2020 | LAMBADALanguage Modeling | —Unverified | 0 |
| Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time | Dec 1, 2019 | LAMBADALanguage Modeling | CodeCode Available | 0 |
| Attending to Entities for Better Text Understanding | Nov 11, 2019 | LAMBADA | —Unverified | 0 |
| Not Enough Data? Deep Learning to the Rescue! | Nov 8, 2019 | Data AugmentationDeep Learning | CodeCode Available | 0 |
| Neural Shuffle-Exchange Networks -- Sequence Processing in O(n log n) Time | Jul 18, 2019 | LAMBADALanguage Modeling | CodeCode Available | 0 |
| Entity Tracking Improves Cloze-style Reading Comprehension | Oct 5, 2018 | LAMBADAReading Comprehension | CodeCode Available | 0 |
| Universal Transformers | Jul 10, 2018 | Inductive BiasLAMBADA | CodeCode Available | 0 |
| Neural Models for Reasoning over Multiple Mentions using Coreference | Apr 16, 2018 | LAMBADAReading Comprehension | —Unverified | 0 |
| Linguistic Knowledge as Memory for Recurrent Neural Networks | Mar 7, 2017 | LAMBADAReading Comprehension | —Unverified | 0 |
| Broad Context Language Modeling as Reading Comprehension | Oct 26, 2016 | coreference-resolutionCoreference Resolution | —Unverified | 0 |