| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 | 5 |
| Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations | Nov 12, 2024 | MathRetrieval | CodeCode Available | 1 | 5 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 | 5 |
| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 | 5 |
| Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts | Oct 23, 2023 | Logical ReasoningMath | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| Are NLP Models really able to Solve Simple Math Word Problems? | Mar 12, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| Case-Based or Rule-Based: How Do Transformers Do the Math? | Feb 27, 2024 | MathSystematic Generalization | CodeCode Available | 1 | 5 |