| LoLCATs: On Low-Rank Linearizing of Large Language Models | Oct 14, 2024 | MMLU | CodeCode Available | 3 | 5 |
| Scaling Instruction-Finetuned Language Models | Oct 20, 2022 | Coreference ResolutionCross-Lingual Question Answering | CodeCode Available | 3 | 5 |
| HadaCore: Tensor Core Accelerated Hadamard Transform Kernel | Dec 12, 2024 | GPUMMLU | CodeCode Available | 3 | 5 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 | 5 |
| Are We Done with MMLU? | Jun 6, 2024 | MMLUVirology | CodeCode Available | 3 | 5 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 | 5 |
| ReasonIR: Training Retrievers for Reasoning Tasks | Apr 29, 2025 | Information RetrievalMMLU | CodeCode Available | 3 | 5 |
| MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark | Jun 3, 2024 | MMLUMulti-task Language Understanding | CodeCode Available | 3 | 5 |
| REPLUG: Retrieval-Augmented Black-Box Language Models | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| YourBench: Easy Custom Evaluation Sets for Everyone | Apr 2, 2025 | MMLU | CodeCode Available | 3 | 5 |