| ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Jun 18, 2024 | AllGSM8K | CodeCode Available | 14 |
| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 |
| SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models | Feb 28, 2025 | MMLU | CodeCode Available | 11 |
| LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Mar 26, 2024 | GPUGSM8K | CodeCode Available | 9 |
| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 |
| DataComp-LM: In search of the next generation of training sets for language models | Jun 17, 2024 | Language ModellingMMLU | CodeCode Available | 7 |
| Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training | May 23, 2024 | GSM8KMixture-of-Experts | CodeCode Available | 7 |
| Efficient multi-prompt evaluation of LLMs | May 27, 2024 | MMLU | CodeCode Available | 7 |
| Qwen2.5-Omni Technical Report | Mar 26, 2025 | Automatic Speech Recognition (ASR)GSM8K | CodeCode Available | 7 |
| ART: Automatic multi-step reasoning and tool-use for large language models | Mar 16, 2023 | MMLU | CodeCode Available | 6 |