| Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning | May 29, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 | 5 |
| Leveraging Web-Crawled Data for High-Quality Fine-Tuning | Aug 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Can We Use Small Models to Investigate Multimodal Fusion Methods? | Sep 1, 2022 | Math | CodeCode Available | 0 | 5 |
| Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process | May 10, 2024 | Geometry Problem SolvingMachine Translation | CodeCode Available | 0 | 5 |
| Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification | Nov 4, 2024 | MathReranking | CodeCode Available | 0 | 5 |
| Can Vision-Language Models Evaluate Handwritten Math? | Jan 13, 2025 | Math | CodeCode Available | 0 | 5 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | Feb 24, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| OntoMath^PRO Ontology: A Linked Data Hub for Mathematics | Jul 17, 2014 | Math | CodeCode Available | 0 | 5 |
| Examining the Robustness of Large Language Models across Language Complexity | Jan 30, 2025 | Math | —Unverified | 0 | 0 |
| Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Aug 9, 2024 | MathMultiple-choice | —Unverified | 0 | 0 |
| Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Oct 25, 2024 | Math | —Unverified | 0 | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning | May 21, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| A range characterization of the single-quadrant ADRT | Oct 11, 2020 | Math | —Unverified | 0 | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning | Dec 23, 2024 | Math | —Unverified | 0 | 0 |
| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | May 29, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 | 0 |
| Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams | Nov 7, 2024 | Math | —Unverified | 0 | 0 |
| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 | 0 |