| Measuring Mathematical Problem Solving With the MATH Dataset | Mar 5, 2021 | MathMathematical Problem-Solving | CodeCode Available | 2 | 5 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 | 5 |
| Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning | Nov 29, 2024 | Mathematical Reasoning | CodeCode Available | 2 | 5 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 | 5 |
| CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Sep 4, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| CoRT: Code-integrated Reasoning within Thinking | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 2 | 5 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 | 5 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 | 5 |