| Automatic Generation of Socratic Subquestions for Teaching Math Word Problems | Nov 23, 2022 | MathMath Word Problem Solving | CodeCode Available | 1 |
| Mathematical Capabilities of ChatGPT | Jan 31, 2023 | Elementary MathematicsMath | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers | Jun 30, 2021 | DiversityMath | CodeCode Available | 1 |
| MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models | Feb 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Sep 18, 2024 | Math | CodeCode Available | 1 |
| M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | Apr 14, 2025 | MambaMath | CodeCode Available | 1 |
| MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education | Jun 2, 2021 | Knowledge TracingLanguage Modeling | CodeCode Available | 1 |
| LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Mar 25, 2025 | Code CompletionLanguage Modeling | CodeCode Available | 1 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks | Oct 16, 2024 | Mathparameter-efficient fine-tuning | CodeCode Available | 1 |
| CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models | May 23, 2023 | 2kMath | CodeCode Available | 1 |
| Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation | Jan 24, 2025 | Math | CodeCode Available | 1 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| LEVER: Learning to Verify Language-to-Code Generation with Execution | Feb 16, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models | Jul 11, 2024 | Language ModellingMath | CodeCode Available | 1 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Jun 24, 2024 | Instruction FollowingMath | CodeCode Available | 1 |
| Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions | May 28, 2022 | Arithmetic ReasoningEfficient Exploration | CodeCode Available | 1 |
| Learning by Fixing: Solving Math Word Problems with Weak Supervision | Dec 19, 2020 | MathWeakly-supervised Learning | CodeCode Available | 1 |
| Learning Goal-Conditioned Representations for Language Reward Models | Jul 18, 2024 | GSM8KMath | CodeCode Available | 1 |
| LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits | Oct 2, 2024 | Instruction FollowingMath | CodeCode Available | 1 |
| A Tree-Structured Decoder for Image-to-Markup Generation | Jan 1, 2020 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 1 |
| DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning | Jun 6, 2024 | Math | CodeCode Available | 1 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| Learning Multi-Step Reasoning by Solving Arithmetic Tasks | Jun 2, 2023 | MathMathematical Reasoning | CodeCode Available | 1 |
| Language Models Encode the Value of Numbers Linearly | Jan 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings | May 30, 2025 | Math | CodeCode Available | 1 |
| Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning | Jan 27, 2023 | Few-Shot LearningGSM8K | CodeCode Available | 1 |
| Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset | Nov 9, 2023 | MathNatural Language Understanding | CodeCode Available | 1 |
| Control LLM: Controlled Evolution for Intelligence Retention in LLM | Jan 19, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Language Models as Science Tutors | Feb 16, 2024 | GSM8KMath | CodeCode Available | 1 |
| Large Language Models Are Neurosymbolic Reasoners | Jan 17, 2024 | Common Sense ReasoningMath | CodeCode Available | 1 |
| Non-myopic Generation of Language Models for Reasoning and Planning | Oct 22, 2024 | Computational EfficiencyLanguage Modelling | CodeCode Available | 1 |
| Design of Chain-of-Thought in Math Problem Solving | Sep 20, 2023 | DiversityGSM8K | CodeCode Available | 1 |
| A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models | Oct 21, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis | Jan 3, 2025 | Math | CodeCode Available | 1 |
| Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency | Oct 28, 2024 | Math | CodeCode Available | 1 |
| Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations | Feb 7, 2020 | Information RetrievalMath | CodeCode Available | 1 |
| Large Language Models Can Be Easily Distracted by Irrelevant Context | Jan 31, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction | Mar 19, 2022 | MathMath Word Problem Solving | CodeCode Available | 1 |
| MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents | Jul 24, 2024 | Math | CodeCode Available | 1 |
| Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction | Jun 5, 2023 | Math | CodeCode Available | 1 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 |
| JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding | Jun 13, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |