| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| IsarStep: a Benchmark for High-level Mathematical Reasoning | Jun 13, 2020 | Mathematical ProofsMathematical Reasoning | CodeCode Available | 1 | 5 |
| HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Oct 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | Apr 14, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 | 5 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis | Jul 15, 2021 | Mathematical ReasoningProgram Synthesis | CodeCode Available | 1 | 5 |
| MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer | Mar 19, 2025 | Answer GenerationMathematical Reasoning | CodeCode Available | 1 | 5 |