| Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers | May 21, 2023 | MMLUZero-shot Generalization | CodeCode Available | 1 | 5 |
| OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | May 28, 2024 | MMLU | CodeCode Available | 1 | 5 |
| MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment | Oct 8, 2024 | ARCBelebele | CodeCode Available | 1 | 5 |
| Efficient Online Data Mixing For Language Model Pre-Training | Dec 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark | Mar 26, 2025 | MMLUMultiple-choice | CodeCode Available | 1 | 5 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 | 5 |
| ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models | Aug 15, 2024 | In-Context LearningMMLU | CodeCode Available | 1 | 5 |
| MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking | Jan 20, 2025 | Decision MakingGSM8K | CodeCode Available | 1 | 5 |
| An Open Source Data Contamination Report for Large Language Models | Oct 26, 2023 | HellaSwagLanguage Modeling | CodeCode Available | 1 | 5 |
| Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models | May 19, 2025 | BenchmarkingChatbot | CodeCode Available | 1 | 5 |