| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| MCC-KD: Multi-CoT Consistent Knowledge Distillation | Oct 23, 2023 | DiversityKnowledge Distillation | CodeCode Available | 0 | 5 |
| Hierarchical Attention Generates Better Proofs | Apr 27, 2025 | Automated Theorem ProvingMathematical Proofs | CodeCode Available | 0 | 5 |
| Adaptive Graph Pruning for Multi-Agent Communication | Jun 3, 2025 | Code GenerationLarge Language Model | CodeCode Available | 0 | 5 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance | Oct 3, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 | 5 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 | 5 |
| Compositional Processing Emerges in Neural Networks Solving Math Problems | May 19, 2021 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | May 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 0 | 5 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 | 5 |
| CoinMath: Harnessing the Power of Coding Instruction for Math LLMs | Dec 16, 2024 | DescriptiveMath | CodeCode Available | 0 | 5 |
| ATHENA: Mathematical Reasoning with Thought Expansion | Nov 2, 2023 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Code Soliloquies for Accurate Calculations in Large Language Models | Sep 21, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 | 5 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Mar 23, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning | Nov 8, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models | Oct 19, 2023 | HallucinationMathematical Reasoning | CodeCode Available | 0 | 5 |
| Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | Dec 9, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 0 | 5 |
| LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning | Jun 5, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 0 | 5 |
| LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges | May 24, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models | Jun 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 | 0 |