| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 | 5 |
| Graph-to-Tree Learning for Solving Math Word Problems | Jul 1, 2020 | DecoderMath | CodeCode Available | 1 | 5 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 | 5 |
| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 | 5 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Automatic Generation of Socratic Subquestions for Teaching Math Word Problems | Nov 23, 2022 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning | May 30, 2021 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving | Feb 15, 2024 | Geometry Problem SolvingMath | CodeCode Available | 1 | 5 |
| A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers | Jun 30, 2021 | DiversityMath | CodeCode Available | 1 | 5 |
| Get an A in Math: Progressive Rectification Prompting | Dec 11, 2023 | Math | CodeCode Available | 1 | 5 |
| Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models | Jun 4, 2025 | Math | CodeCode Available | 1 | 5 |
| Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling | Nov 12, 2022 | Entity LinkingKnowledge Graphs | CodeCode Available | 1 | 5 |
| ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges | May 21, 2025 | Mathvalid | CodeCode Available | 1 | 5 |
| A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for Hand | Apr 25, 2020 | MathRelation | CodeCode Available | 1 | 5 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices | Mar 19, 2024 | Math | CodeCode Available | 1 | 5 |
| CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models | May 23, 2023 | 2kMath | CodeCode Available | 1 | 5 |
| From GAN to WGAN | Apr 18, 2019 | Generative Adversarial NetworkMath | CodeCode Available | 1 | 5 |
| From Zero to Hero: Convincing with Extremely Complicated Math | Apr 1, 2023 | Math | CodeCode Available | 1 | 5 |
| Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency | Oct 28, 2024 | Math | CodeCode Available | 1 | 5 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 | 5 |
| Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation | Apr 15, 2025 | MathQuantum Machine Learning | CodeCode Available | 1 | 5 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| FormulaNet: A Benchmark Dataset for Mathematical Formula Detection | Aug 29, 2022 | Math | CodeCode Available | 1 | 5 |
| Control LLM: Controlled Evolution for Intelligence Retention in LLM | Jan 19, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Math Word Problem Solving with Explicit Numerical Values | Aug 1, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | Apr 8, 2025 | MathMultimodal Reasoning | CodeCode Available | 1 | 5 |
| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| A Tree-Structured Decoder for Image-to-Markup Generation | Jan 1, 2020 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 1 | 5 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 | 5 |
| MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations | Feb 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions | Jun 7, 2021 | MathQuestion Answering | CodeCode Available | 1 | 5 |
| AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models | Jul 11, 2024 | Language ModellingMath | CodeCode Available | 1 | 5 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula | Aug 8, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 | 5 |
| CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis | Jan 3, 2025 | Math | CodeCode Available | 1 | 5 |
| A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings | May 30, 2025 | Math | CodeCode Available | 1 | 5 |
| Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset | Nov 9, 2023 | MathNatural Language Understanding | CodeCode Available | 1 | 5 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 | 5 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 | 5 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models | Oct 21, 2022 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents | Jul 24, 2024 | Math | CodeCode Available | 1 | 5 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 | 5 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |