| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Oct 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| Question Translation Training for Better Multilingual Reasoning | Jan 15, 2024 | Mathematical ReasoningTranslation | CodeCode Available | 1 | 5 |
| Process-Driven Autoformalization in Lean 4 | Jun 4, 2024 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Oct 7, 2024 | GSM8KLogical Reasoning | CodeCode Available | 1 | 5 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |