| HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Oct 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Apr 18, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Graph-to-Tree Learning for Solving Math Word Problems | Jul 1, 2020 | DecoderMath | CodeCode Available | 1 |
| Entropy-Regularized Process Reward Model | Dec 15, 2024 | GSM8KMath | CodeCode Available | 1 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 |
| Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word Problem | Apr 7, 2020 | DecoderMachine Translation | CodeCode Available | 1 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Are NLP Models really able to Solve Simple Math Word Problems? | Mar 12, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 |