| CoinMath: Harnessing the Power of Coding Instruction for Math LLMs | Dec 16, 2024 | DescriptiveMath | CodeCode Available | 0 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 |
| Planning and Editing What You Retrieve for Enhanced Tool Learning | Mar 30, 2024 | Mathematical ReasoningRetrieval | CodeCode Available | 0 |
| Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement | Feb 18, 2024 | Mathematical ReasoningText Generation | CodeCode Available | 0 |
| Code Soliloquies for Accurate Calculations in Large Language Models | Sep 21, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Apr 17, 2024 | FormLanguage Model Evaluation | CodeCode Available | 0 |
| Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic | Nov 3, 2022 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 0 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Table Question Answering for Low-resourced Indic Languages | Oct 4, 2024 | Cross-Lingual TransferMathematical Reasoning | CodeCode Available | 0 |
| Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction | Jun 2, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| A Survey on Mathematical Reasoning and Optimization with Large Language Models | Mar 22, 2025 | Automated Theorem ProvingHeuristic Search | CodeCode Available | 0 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Oct 10, 2024 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English | Dec 24, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Reasoning with Transformer-based Models: Deep Learning, but Shallow Reasoning | Jun 22, 2021 | Deep LearningLogical Reasoning | CodeCode Available | 0 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 |
| MultiLingPoT: Enhancing Mathematical Reasoning with Multilingual Program Fine-tuning | Dec 17, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | Dec 22, 2024 | Decision MakingMachine Translation | CodeCode Available | 0 |
| MoD: A Distribution-Based Approach for Merging Large Language Models | Nov 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation | Jul 6, 2022 | Mathematical Reasoning | CodeCode Available | 0 |
| MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | May 19, 2025 | DecoderImage Generation | CodeCode Available | 0 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 |
| Techniques to Improve Neural Math Word Problem Solvers | Feb 6, 2023 | DecoderLanguage Modelling | CodeCode Available | 0 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Compositional Generalization with Tree Stack Memory Units | Nov 5, 2019 | Mathematical ReasoningZero-shot Generalization | CodeCode Available | 0 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 |
| Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation | Dec 20, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Temporal Consistency for LLM Reasoning Process Error Identification | Mar 18, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree | Nov 23, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 0 |
| Reverse Operation based Data Augmentation for Solving Math Word Problems | Oct 4, 2020 | Data AugmentationMath | CodeCode Available | 0 |
| TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models | Oct 16, 2023 | Automated Theorem ProvingBenchmarking | CodeCode Available | 0 |
| A Survey of Deep Learning for Geometry Problem Solving | Jul 16, 2025 | Deep LearningGeometry Problem Solving | CodeCode Available | 0 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| MCC-KD: Multi-CoT Consistent Knowledge Distillation | Oct 23, 2023 | DiversityKnowledge Distillation | CodeCode Available | 0 |
| Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | Jun 24, 2023 | DecoderIngenuity | CodeCode Available | 0 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 |
| Can A Gamer Train A Mathematical Reasoning Model? | Jun 10, 2025 | GPUMathematical Reasoning | CodeCode Available | 0 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation | May 30, 2025 | Code GenerationDiversity | CodeCode Available | 0 |
| Position: AI Evaluation Should Learn from How We Test Humans | Jun 18, 2023 | Mathematical ReasoningPosition | CodeCode Available | 0 |
| RoMath: A Mathematical Reasoning Benchmark in Romanian | Sep 17, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| MathScale: Scaling Instruction Tuning for Mathematical Reasoning | Mar 5, 2024 | GSM8KMath | CodeCode Available | 0 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 |