| Natural Language Reasoning, A Survey | Mar 26, 2023 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Process-Driven Autoformalization in Lean 4 | Jun 4, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| IsarStep: a Benchmark for High-level Mathematical Reasoning | Jun 13, 2020 | Mathematical ProofsMathematical Reasoning | CodeCode Available | 1 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models | May 14, 2025 | DiversityMathematical Reasoning | CodeCode Available | 1 |
| DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Jul 4, 2024 | AvgGSM8K | CodeCode Available | 1 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning | Aug 16, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion | Mar 20, 2025 | Data AugmentationMathematical Problem-Solving | CodeCode Available | 1 |
| Mathematical Capabilities of ChatGPT | Jan 31, 2023 | Elementary MathematicsMath | CodeCode Available | 1 |
| MathPrompter: Mathematical Reasoning using Large Language Models | Mar 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver | Sep 6, 2024 | Geometry Problem SolvingMathematical Reasoning | CodeCode Available | 1 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | May 29, 2025 | Automated Theorem ProvingMathematical Reasoning | CodeCode Available | 1 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |
| A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Apr 15, 2025 | Code GenerationGeneral Knowledge | CodeCode Available | 1 |
| GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning | May 30, 2021 | MathMathematical Reasoning | CodeCode Available | 1 |
| MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer | Mar 19, 2025 | Answer GenerationMathematical Reasoning | CodeCode Available | 1 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level | Dec 31, 2021 | Few-Shot LearningLanguage Modelling | CodeCode Available | 1 |
| Crosslingual Reasoning through Test-Time Scaling | May 8, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization | Jul 8, 2025 | Active LearningAutomated Theorem Proving | CodeCode Available | 1 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 |
| A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models | Oct 21, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization | Jan 12, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Auto-Regressive Next-Token Predictors are Universal Learners | Sep 13, 2023 | Mathematical ReasoningText Generation | CodeCode Available | 1 |
| Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Jun 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 1 |
| CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought | Feb 24, 2025 | Mathematical ReasoningMisinformation | CodeCode Available | 1 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning | Jan 15, 2021 | Inductive BiasMathematical Reasoning | CodeCode Available | 1 |
| A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram | Feb 22, 2023 | Geometry Problem SolvingMathematical Reasoning | CodeCode Available | 1 |
| Control LLM: Controlled Evolution for Intelligence Retention in LLM | Jan 19, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling | May 25, 2025 | Computational EfficiencyMathematical Reasoning | CodeCode Available | 1 |
| Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | Oct 6, 2023 | Code CompletionIn-Context Learning | CodeCode Available | 1 |
| IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning | Oct 25, 2021 | Arithmetic ReasoningMathematical Question Answering | CodeCode Available | 1 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Learning Multi-Step Reasoning by Solving Arithmetic Tasks | Jun 2, 2023 | MathMathematical Reasoning | CodeCode Available | 1 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 |
| Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models | Feb 20, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |