| MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Sep 3, 2024 | MMLU | CodeCode Available | 0 | 5 |
| Probing then Editing Response Personality of Large Language Models | Apr 14, 2025 | MMLU | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |
| BenTo: Benchmark Task Reduction with In-Context Transferability | Oct 17, 2024 | In-Context LearningMMLU | CodeCode Available | 0 | 5 |
| Input Conditioned Graph Generation for Language Agents | Jun 17, 2024 | Graph GenerationMMLU | CodeCode Available | 0 | 5 |
| Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation | Jun 20, 2024 | GSM8KLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning | May 24, 2025 | Computational EfficiencyMMLU | CodeCode Available | 0 | 5 |
| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 | 5 |
| Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning | Oct 14, 2024 | In-Context LearningMMLU | CodeCode Available | 0 | 5 |
| LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient | Feb 2, 2025 | MMLU | CodeCode Available | 0 | 5 |