| Instance-adaptive Zero-shot Chain-of-Thought Prompting | Sep 30, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Instruction Tuning with Human Curriculum | Oct 14, 2023 | ARCMMLU | —Unverified | 0 | 0 |
| Integrating External Tools with Large Language Models to Improve Accuracy | Jul 9, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 | 0 |
| Interleaved Reasoning for Large Language Models via Reinforcement Learning | May 26, 2025 | Logical ReasoningMath | —Unverified | 0 | 0 |
| Investigating Data Contamination in Modern Benchmarks for Large Language Models | Nov 16, 2023 | Common Sense ReasoningMMLU | —Unverified | 0 | 0 |
| Irreducible Curriculum for Language Model Pretraining | Oct 23, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs | Jan 21, 2025 | GSM8KIn-Context Learning | —Unverified | 0 | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 | 0 |
| KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning | May 14, 2025 | BenchmarkingMMLU | —Unverified | 0 | 0 |
| KurTail : Kurtosis-based LLM Quantization | Mar 3, 2025 | GPULanguage Modeling | —Unverified | 0 | 0 |