| Matryoshka Model Learning for Improved Elastic Student Models | May 29, 2025 | LAMBADAMath | —Unverified | 0 |
| AdaGC: Improving Training Stability for Large Language Model Pretraining | Feb 16, 2025 | LAMBADALanguage Modeling | —Unverified | 0 |
| Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM | Nov 3, 2024 | LAMBADAText Generation | CodeCode Available | 1 |
| Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | Oct 28, 2024 | Automated Theorem ProvingCode Generation | CodeCode Available | 1 |
| Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Jul 17, 2024 | GPULAMBADA | CodeCode Available | 2 |
| SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning | Feb 20, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| PIXAR: Auto-Regressive Language Modeling in Pixel Space | Jan 6, 2024 | DecoderLAMBADA | —Unverified | 0 |
| Concise and Organized Perception Facilitates Reasoning in Large Language Models | Oct 5, 2023 | LAMBADAMath | —Unverified | 0 |
| Headless Language Models: Learning without Predicting with Contrastive Weight Tying | Sep 15, 2023 | LAMBADA | —Unverified | 0 |
| Stay on topic with Classifier-Free Guidance | Jun 30, 2023 | Code GenerationCommon Sense Reasoning | —Unverified | 0 |