| Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Mar 7, 2025 | GPUMath | —Unverified | 0 |
| Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models | Oct 9, 2023 | MMLU | —Unverified | 0 |
| TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise | Oct 29, 2023 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 |
| The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance | Jun 17, 2024 | counterfactualMMLU | —Unverified | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| The Poison of Alignment | Aug 25, 2023 | MMLU | —Unverified | 0 |
| The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance? | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Uncovering Latent Chain of Thought Vectors in Language Models | Sep 21, 2024 | ARCGSM8K | —Unverified | 0 |
| Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark | Feb 10, 2025 | MMLUMorphological Analysis | —Unverified | 0 |