| LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning | Jun 16, 2025 | Code GenerationMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems | Mar 19, 2025 | Mathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks | Apr 19, 2024 | Mathematical Problem-Solving | CodeCode Available | 0 | 5 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving | Oct 19, 2023 | GSM8KMath | CodeCode Available | 0 | 5 |
| SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models | Mar 12, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving | Jun 19, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 | 0 |
| Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks | Oct 24, 2024 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 | 0 |
| MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Feb 17, 2025 | Code CompletionGSM8K | —Unverified | 0 | 0 |
| Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems | May 21, 2025 | BenchmarkingMath | —Unverified | 0 | 0 |
| How Do Large Language Monkeys Get Their Power (Laws)? | Feb 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| The Buffer Mechanism for Multi-Step Information Reasoning in Language Models | May 24, 2024 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models | Apr 9, 2025 | Instruction FollowingMathematical Problem-Solving | —Unverified | 0 | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 | 0 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions | Apr 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu | May 22, 2025 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning | Feb 19, 2025 | Common Sense ReasoningMathematical Problem-Solving | —Unverified | 0 | 0 |
| Can LLMs plan paths with extra hints from solvers? | Oct 7, 2024 | Mathematical Problem-SolvingProgram Synthesis | —Unverified | 0 | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step | Jun 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| On Vanishing Variance in Transformer Length Generalization | Apr 3, 2025 | AttributeMathematical Problem-Solving | —Unverified | 0 | 0 |
| Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics | Apr 1, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education | Sep 9, 2023 | ChatbotMathematical Problem-Solving | —Unverified | 0 | 0 |
| Performance Comparison of Large Language Models on Advanced Calculus Problems | Mar 5, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation | Oct 2, 2024 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| PoLAR: Polar-Decomposed Low-Rank Adapter Representation | Jun 3, 2025 | Mathematical Problem-SolvingRiemannian optimization | —Unverified | 0 | 0 |
| Premise Order Matters in Reasoning with Large Language Models | Feb 14, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 | 0 |
| 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward | Jun 11, 2024 | Instruction FollowingMathematical Problem-Solving | —Unverified | 0 | 0 |
| Bayesian artificial brain with ChatGPT | Aug 28, 2023 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| Reasoning Models Can Be Effective Without Thinking | Apr 14, 2025 | Automated Theorem ProvingMathematical Problem-Solving | —Unverified | 0 | 0 |
| Scaling Autonomous Agents via Automatic Reward Modeling And Planning | Feb 17, 2025 | Decision MakingMathematical Problem-Solving | —Unverified | 0 | 0 |
| Scaling Laws for Autoregressive Generative Modeling | Oct 28, 2020 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models | Feb 25, 2025 | Continual LearningGSM8K | —Unverified | 0 | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 | 0 |
| Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | Mar 4, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs | May 16, 2025 | Mathematical Problem-SolvingReinforcement Learning (RL) | —Unverified | 0 | 0 |
| SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving | May 22, 2025 | DiagnosticMathematical Problem-Solving | —Unverified | 0 | 0 |
| Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs | Feb 4, 2025 | Formal LogicKnowledge Graphs | —Unverified | 0 | 0 |
| STRIVE: Structured Reasoning for Self-Improvement in Claim Verification | Feb 17, 2025 | Claim VerificationMathematical Problem-Solving | —Unverified | 0 | 0 |
| Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations | May 16, 2025 | Code GenerationMathematical Problem-Solving | —Unverified | 0 | 0 |
| Automatic Detection of Reflective Thinking in Mathematical Problem Solving based on Unconstrained Bodily Exploration | Dec 18, 2018 | Mathematical Problem-Solving | —Unverified | 0 | 0 |
| Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Feb 17, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving | Jun 12, 2025 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 | 0 |
| The Consensus Game: Language Model Generation via Equilibrium Search | Oct 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning | Oct 20, 2023 | Mathematical Problem-SolvingPosition | —Unverified | 0 | 0 |
| Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Jan 28, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| Large Language Models for Mathematical Reasoning: Progresses and Challenges | Jan 31, 2024 | DiversityMath | —Unverified | 0 | 0 |