| Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content | Jun 25, 2025 | ArticlesContinual Pretraining | —Unverified | 0 |
| LLaVA-c: Continual Improved Visual Instruction Tuning | Jun 10, 2025 | Continual LearningContinual Pretraining | —Unverified | 0 |
| Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation | May 30, 2025 | Continual PretrainingFairness | CodeCode Available | 0 |
| A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP | May 22, 2025 | Continual PretrainingDiagnostic | CodeCode Available | 0 |
| Enhance Mobile Agents Thinking Process Via Iterative Preference Learning | May 18, 2025 | Continual Pretraining | —Unverified | 0 |
| Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning | May 15, 2025 | Continual PretrainingMMLU | —Unverified | 0 |
| Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language | Apr 28, 2025 | Continual PretrainingGPU | —Unverified | 0 |
| TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining | Apr 2, 2025 | Continual LearningContinual Pretraining | CodeCode Available | 1 |
| Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them | Mar 27, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |
| Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling | Mar 24, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |