| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models | Jul 12, 2024 | GSM8KMath | —Unverified | 0 |
| Towards Multilingual LLM Evaluation for European Languages | Oct 11, 2024 | ARCGSM8K | —Unverified | 0 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| Training Chain-of-Thought via Latent-Variable Inference | Nov 28, 2023 | GSM8K | —Unverified | 0 |
| Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Dec 4, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 |
| Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning | Apr 17, 2024 | GSM8KNavigate | —Unverified | 0 |