| VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jul 17, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| A Survey of Deep Learning for Geometry Problem Solving | Jul 16, 2025 | Deep LearningGeometry Problem Solving | CodeCode Available | 0 |
| KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Jul 15, 2025 | GSM8KLanguage Modeling | —Unverified | 0 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Integrating External Tools with Large Language Models to Improve Accuracy | Jul 9, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Agentic-R1: Distilled Dual-Strategy Reasoning | Jul 8, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization | Jul 8, 2025 | Active LearningAutomated Theorem Proving | CodeCode Available | 1 |
| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 |
| Skywork-R1V3 Technical Report | Jul 8, 2025 | cross-modal alignmentMathematical Reasoning | CodeCode Available | 7 |
| Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Jun 30, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training | Jun 27, 2025 | Knowledge DistillationMathematical Reasoning | —Unverified | 0 |
| Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 |
| AdapThink: Adaptive Thinking Preferences for Reasoning Language Model | Jun 23, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models | Jun 21, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 |
| Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving | Jun 20, 2025 | Automated Theorem ProvingDiversity | —Unverified | 0 |
| Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality | Jun 17, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team | Jun 17, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot | Jun 17, 2025 | In-Context LearningMathematical Reasoning | —Unverified | 0 |
| Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles | Jun 16, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |
| A Technical Study into Small Reasoning Language Models | Jun 16, 2025 | Code GenerationComputational Efficiency | —Unverified | 0 |
| Eliciting Reasoning in Language Models with Cognitive Tools | Jun 13, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment | Jun 13, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |