| Llama 3 Meets MoE: Efficient Upcycling | Dec 13, 2024 | Mixture-of-ExpertsMMLU | —Unverified | 0 | 0 |
| LLaMA Beyond English: An Empirical Study on Language Capability Transfer | Jan 2, 2024 | GPUInformativeness | —Unverified | 0 | 0 |
| LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction | Apr 1, 2024 | Image CaptioningInstruction Following | —Unverified | 0 | 0 |
| Large Language Model Compression with Neural Architecture Search | Oct 9, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 | 0 |
| LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering | Dec 13, 2024 | Few-Shot LearningKnowledge Distillation | —Unverified | 0 | 0 |
| LLMs Outperform Experts on Challenging Biology Benchmarks | May 9, 2025 | MMLUVirology | —Unverified | 0 | 0 |
| LM-Cocktail: Resilient Tuning of Language Models via Model Merging | Nov 22, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 | 0 |
| An Empirical Study of Mamba-based Language Models | Jun 12, 2024 | 16kIn-Context Learning | —Unverified | 0 | 0 |
| Measuring Hong Kong Massive Multi-Task Language Understanding | May 4, 2025 | MMLUMulti-task Language Understanding | —Unverified | 0 | 0 |