| Predicting Emergent Capabilities by Finetuning | Nov 25, 2024 | CoLAGSM8K | —Unverified | 0 |
| BOTS-LM: Training Large Language Models for Setswana | Aug 5, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs | Sep 30, 2024 | ARCDiversity | —Unverified | 0 |
| Project MPG: towards a generalized performance benchmark for LLM capabilities | Oct 28, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Pruning Large Language Models via Accuracy Predictor | Sep 18, 2023 | MMLUModel Compression | —Unverified | 0 |
| ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology | Nov 16, 2023 | MMLUMultiple-choice | —Unverified | 0 |
| Quantifying Variance in Evaluation Benchmarks | Jun 14, 2024 | MMLU | —Unverified | 0 |
| ALLaM: Large Language Models for Arabic and English | Jul 22, 2024 | DecoderLanguage Acquisition | —Unverified | 0 |
| AgentInstruct: Toward Generative Teaching with Agentic Flows | Jul 3, 2024 | GSM8KMMLU | —Unverified | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |