| Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them | Mar 27, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |
| Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling | Mar 24, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |
| AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text | Mar 24, 2025 | Continual PretrainingEmotion Classification | —Unverified | 0 |
| Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge | Mar 6, 2025 | Continual PretrainingMemorization | CodeCode Available | 0 |
| Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study | Feb 4, 2025 | Continual PretrainingMachine Translation | —Unverified | 0 |
| Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models | Dec 10, 2024 | Continual PretrainingLanguage Modeling | —Unverified | 0 |
| Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation | Oct 21, 2024 | Automated Theorem ProvingContinual Pretraining | CodeCode Available | 0 |
| The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging | Sep 30, 2024 | Continual Pretraining | —Unverified | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy | Sep 29, 2024 | AstronomyBenchmarking | —Unverified | 0 |