| Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models | Jun 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | May 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 0 | 5 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 | 5 |
| CoinMath: Harnessing the Power of Coding Instruction for Math LLMs | Dec 16, 2024 | DescriptiveMath | CodeCode Available | 0 | 5 |
| ATHENA: Mathematical Reasoning with Thought Expansion | Nov 2, 2023 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 | 5 |
| Code Soliloquies for Accurate Calculations in Large Language Models | Sep 21, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |
| MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models | Oct 19, 2023 | HallucinationMathematical Reasoning | CodeCode Available | 0 | 5 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning | Nov 8, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |