| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 | 5 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 | 5 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 | 5 |
| How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | Dec 4, 2024 | GSM8K | CodeCode Available | 2 | 5 |
| Language Models are Multilingual Chain-of-Thought Reasoners | Oct 6, 2022 | GSM8KMath | CodeCode Available | 2 | 5 |
| Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | Nov 6, 2023 | DecoderGSM8K | CodeCode Available | 2 | 5 |
| ProcessBench: Identifying Process Errors in Mathematical Reasoning | Dec 9, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 | 5 |