| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 | 5 |
| Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function | Jun 3, 2024 | DiversityMMLU | CodeCode Available | 0 | 5 |
| Review-Instruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models | May 16, 2025 | DiversityMMLU | CodeCode Available | 0 | 5 |
| OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms | Feb 11, 2025 | Knowledge DistillationMMLU | CodeCode Available | 0 | 5 |
| Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs | May 31, 2025 | MMLU | CodeCode Available | 0 | 5 |
| BnMMLU: Measuring Massive Multitask Language Understanding in Bengali | May 25, 2025 | General KnowledgeMMLU | CodeCode Available | 0 | 5 |
| MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Sep 3, 2024 | MMLU | CodeCode Available | 0 | 5 |
| Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning | Oct 14, 2024 | In-Context LearningMMLU | CodeCode Available | 0 | 5 |
| DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance | Jan 29, 2025 | DiversityMMLU | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |