| From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | May 23, 2024 | GSM8K | CodeCode Available | 3 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications | May 14, 2024 | GSM8KMath | —Unverified | 0 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 |
| MathDivide: Improved mathematical reasoning by large language models | May 12, 2024 | GSM8KLogical Reasoning | —Unverified | 0 |
| MAmmoTH2: Scaling Instructions from the Web | May 6, 2024 | ChatbotGSM8K | —Unverified | 0 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 |
| Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning | May 1, 2024 | ARCGSM8K | CodeCode Available | 3 |