| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 | 5 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 | 5 |
| Rho-1: Not All Tokens Are What You Need | Apr 11, 2024 | AllContinual Pretraining | CodeCode Available | 3 | 5 |
| Data Engineering for Scaling Language Models to 128K Context | Feb 15, 2024 | 4kContinual Pretraining | CodeCode Available | 3 | 5 |
| Retrieval Head Mechanistically Explains Long-Context Factuality | Apr 24, 2024 | Continual PretrainingHallucination | CodeCode Available | 3 | 5 |
| MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | May 20, 2024 | Continual PretrainingMathematical Reasoning | CodeCode Available | 3 | 5 |
| Continual Training of Language Models for Few-Shot Learning | Oct 11, 2022 | Continual LearningContinual Pretraining | CodeCode Available | 2 | 5 |
| A Practitioner's Guide to Continual Multimodal Pretraining | Aug 26, 2024 | Continual LearningContinual Pretraining | CodeCode Available | 2 | 5 |
| Continual Pre-training of Language Models | Feb 7, 2023 | Continual LearningContinual Pretraining | CodeCode Available | 2 | 5 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 | 5 |