| FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy | Feb 8, 2025 | MMLU | —Unverified | 0 |
| Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning | Feb 3, 2025 | Data ValuationLanguage Modeling | CodeCode Available | 0 |
| Evaluation of Large Language Models via Coupled Token Generation | Feb 3, 2025 | ChatbotLarge Language Model | CodeCode Available | 0 |
| Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | Feb 2, 2025 | MathMMLU | —Unverified | 0 |
| LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient | Feb 2, 2025 | MMLU | CodeCode Available | 0 |
| DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance | Jan 29, 2025 | DiversityMMLU | CodeCode Available | 0 |
| IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI | Jan 26, 2025 | MMLUMultiple-choice | —Unverified | 0 |
| Humanity's Last Exam | Jan 24, 2025 | Humanity's Last ExamLanguage Modeling | —Unverified | 0 |