| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions | Apr 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning | Feb 19, 2025 | Common Sense ReasoningMathematical Problem-Solving | —Unverified | 0 |
| OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step | Jun 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On Vanishing Variance in Transformer Length Generalization | Apr 3, 2025 | AttributeMathematical Problem-Solving | —Unverified | 0 |
| Performance Comparison of Large Language Models on Advanced Calculus Problems | Mar 5, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks | Apr 19, 2024 | Mathematical Problem-Solving | CodeCode Available | 0 |
| MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems | Mar 19, 2025 | Mathematical Problem-Solving | CodeCode Available | 0 |
| LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning | Jun 16, 2025 | Code GenerationMathematical Problem-Solving | CodeCode Available | 0 |