| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| An Expression Tree Decoding Strategy for Mathematical Equation Generation | Oct 14, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| Measuring Mathematical Problem Solving With the MATH Dataset | Mar 5, 2021 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Accelerating Sparse Deep Neural Networks | Apr 16, 2021 | GPUMath | CodeCode Available | 2 |
| Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Aug 1, 2024 | Math | CodeCode Available | 2 |
| Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling | Jan 20, 2025 | Imitation LearningLanguage Modeling | CodeCode Available | 2 |
| Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Mar 14, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Jun 25, 2024 | DiversityMath | CodeCode Available | 2 |
| Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | Feb 22, 2024 | DiversityMath | CodeCode Available | 2 |
| MAS-Zero: Designing Multi-Agent Systems with Zero Supervision | May 26, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| Agent Lumos: Unified and Modular Training for Open-Source Language Agents | Nov 9, 2023 | MathQuestion Answering | CodeCode Available | 2 |
| Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | Jan 29, 2025 | Instruction FollowingMath | CodeCode Available | 2 |
| MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | Apr 6, 2024 | Logical ReasoningMath | CodeCode Available | 2 |
| Cumulative Reasoning with Large Language Models | Aug 8, 2023 | Decision MakingLogical Reasoning | CodeCode Available | 2 |
| AdaptThink: Reasoning Models Can Learn When to Think | May 19, 2025 | Math | CodeCode Available | 2 |
| Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models | Jun 13, 2024 | MathQuantization | CodeCode Available | 2 |
| MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning | Sep 11, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 |