| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions | Jun 7, 2021 | MathQuestion Answering | CodeCode Available | 1 | 5 |
| Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula | Aug 8, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 | 5 |
| OJBench: A Competition Level Code Benchmark For Large Language Models | Jun 19, 2025 | Math | CodeCode Available | 1 | 5 |
| ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models | May 23, 2023 | Math | CodeCode Available | 1 | 5 |
| ArMATH: a Dataset for Solving Arabic Math Word Problems | Jun 1, 2022 | Deep LearningMath | CodeCode Available | 1 | 5 |
| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 | 5 |
| Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems | Sep 24, 2020 | DiversityMath | CodeCode Available | 1 | 5 |
| Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Oct 28, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 | 5 |
| Over-Reasoning and Redundant Calculation of Large Language Models | Jan 21, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |