| How Do Large Language Monkeys Get Their Power (Laws)? | Feb 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| The Buffer Mechanism for Multi-Step Information Reasoning in Language Models | May 24, 2024 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models | Apr 9, 2025 | Instruction FollowingMathematical Problem-Solving | —Unverified | 0 | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 | 0 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions | Apr 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu | May 22, 2025 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning | Feb 19, 2025 | Common Sense ReasoningMathematical Problem-Solving | —Unverified | 0 | 0 |