| Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process | May 10, 2024 | Geometry Problem SolvingMachine Translation | CodeCode Available | 0 | 5 |
| Examining the Robustness of Large Language Models across Language Complexity | Jan 30, 2025 | Math | —Unverified | 0 | 0 |
| Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Aug 9, 2024 | MathMultiple-choice | —Unverified | 0 | 0 |
| Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Oct 25, 2024 | Math | —Unverified | 0 | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning | May 21, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| A range characterization of the single-quadrant ADRT | Oct 11, 2020 | Math | —Unverified | 0 | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning | Dec 23, 2024 | Math | —Unverified | 0 | 0 |