| The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models | Aug 13, 2021 | LAMBADAText Generation | —Unverified | 0 |
| E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks | Nov 10, 2020 | LAMBADALanguage Modeling | —Unverified | 0 |
| Headless Language Models: Learning without Predicting with Contrastive Weight Tying | Sep 15, 2023 | LAMBADA | —Unverified | 0 |
| Stay on topic with Classifier-Free Guidance | Jun 30, 2023 | Code GenerationCommon Sense Reasoning | —Unverified | 0 |
| SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning | Feb 20, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Entity Tracking Improves Cloze-style Reading Comprehension | Oct 5, 2018 | LAMBADAReading Comprehension | CodeCode Available | 0 |
| Universal Transformers | Jul 10, 2018 | Inductive BiasLAMBADA | CodeCode Available | 0 |
| Neural Shuffle-Exchange Networks -- Sequence Processing in O(n log n) Time | Jul 18, 2019 | LAMBADALanguage Modeling | CodeCode Available | 0 |
| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 |
| Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time | Dec 1, 2019 | LAMBADALanguage Modeling | CodeCode Available | 0 |