| PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models | Mar 12, 2025 | DenoisingLanguage Modeling | CodeCode Available | 4 |
| BAMBI: Developing Baby Language Models for Italian | Mar 12, 2025 | Language AcquisitionLanguage Modeling | —Unverified | 0 |
| xVLM2Vec: Adapting LVLM-based embedding models to multilinguality using Self-Knowledge Distillation | Mar 12, 2025 | Knowledge DistillationLanguage Modeling | —Unverified | 0 |
| Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Language-Enhanced Representation Learning for Single-Cell Transcriptomics | Mar 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability | Mar 12, 2025 | DisentanglementLanguage Modeling | —Unverified | 0 |
| Global Position Aware Group Choreography using Large Language Model | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Token Weighting for Long-Range Language Modeling | Mar 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Why LLMs Cannot Think and How to Fix It | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |