| Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Jun 30, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training | Jun 27, 2025 | Knowledge DistillationMathematical Reasoning | —Unverified | 0 |
| Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 |
| AdapThink: Adaptive Thinking Preferences for Reasoning Language Model | Jun 23, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models | Jun 21, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 |
| Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving | Jun 20, 2025 | Automated Theorem ProvingDiversity | —Unverified | 0 |
| Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot | Jun 17, 2025 | In-Context LearningMathematical Reasoning | —Unverified | 0 |
| Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality | Jun 17, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |