| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | Aug 3, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| Preference Optimization for Reasoning with Pseudo Feedback | Nov 25, 2024 | GSM8KMath | CodeCode Available | 2 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| ProcessBench: Identifying Process Errors in Mathematical Reasoning | Dec 9, 2024 | GSM8KMath | CodeCode Available | 2 |
| How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | Dec 4, 2024 | GSM8K | CodeCode Available | 2 |
| Progressive-Hint Prompting Improves Reasoning in Large Language Models | Apr 19, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 |
| Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | Oct 10, 2024 | GSM8KMath | CodeCode Available | 2 |
| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 |
| Natural Language Fine-Tuning | Dec 29, 2024 | GSM8KLarge Language Model | CodeCode Available | 2 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 |
| Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Feb 24, 2025 | GSM8KMath | CodeCode Available | 2 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 |
| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |
| Balancing LoRA Performance and Efficiency with Simple Shard Sharing | Sep 19, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 2 |
| CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Feb 13, 2025 | GSM8K | CodeCode Available | 2 |
| Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | Nov 6, 2024 | ARCGSM8K | CodeCode Available | 2 |