| LLMs Do Not Have Human-Like Working Memory | Apr 30, 2025 | Math | —Unverified | 0 |
| Phi-4-reasoning Technical Report | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Local Prompt Optimization | Apr 29, 2025 | GSM8KMath | —Unverified | 0 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries | Apr 27, 2025 | Automated Theorem ProvingBug fixing | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| SplitReason: Learning To Offload Reasoning | Apr 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models | Apr 22, 2025 | Math | —Unverified | 0 |