| Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks | Nov 22, 2022 | Math | CodeCode Available | 3 | 5 |
| Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Feb 10, 2025 | Math | CodeCode Available | 3 | 5 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 | 5 |
| Noise Contrastive Alignment of Language Models with Explicit Rewards | Feb 8, 2024 | Language ModellingMath | CodeCode Available | 3 | 5 |
| MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 3 | 5 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 | 5 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 | 5 |
| DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought | Dec 23, 2024 | Machine TranslationMath | CodeCode Available | 3 | 5 |
| Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory | Apr 10, 2025 | MathMMLU | CodeCode Available | 3 | 5 |
| MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Jan 16, 2024 | GSM8KMath | CodeCode Available | 3 | 5 |