| Turning large language models into cognitive models | Jun 6, 2023 | Decision MakingMathematical Reasoning | CodeCode Available | 1 |
| Evaluating Language Models for Mathematics through Interactions | Jun 2, 2023 | Language ModellingMathematical Problem-Solving | CodeCode Available | 1 |
| Learning Multi-Step Reasoning by Solving Arithmetic Tasks | Jun 2, 2023 | MathMathematical Reasoning | CodeCode Available | 1 |
| A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis | May 24, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 1 |
| FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization | May 4, 2023 | Federated Learningglobal-optimization | CodeCode Available | 1 |
| Natural Language Reasoning, A Survey | Mar 26, 2023 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| MathPrompter: Mathematical Reasoning using Large Language Models | Mar 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram | Feb 22, 2023 | Geometry Problem SolvingMathematical Reasoning | CodeCode Available | 1 |
| Tree-Based Representation and Generation of Natural and Mathematical Language | Feb 15, 2023 | MathMathematical Reasoning | CodeCode Available | 1 |
| Mathematical Capabilities of ChatGPT | Jan 31, 2023 | Elementary MathematicsMath | CodeCode Available | 1 |
| UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression | Dec 6, 2022 | Geometry Problem SolvingLogical Reasoning | CodeCode Available | 1 |
| Peano: Learning Formal Mathematical Reasoning | Nov 29, 2022 | Automated Theorem ProvingMathematical Reasoning | CodeCode Available | 1 |
| Lila: A Unified Benchmark for Mathematical Reasoning | Oct 31, 2022 | DiversityMathematical Reasoning | CodeCode Available | 1 |
| A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models | Oct 21, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning | Sep 29, 2022 | Logical ReasoningMath | CodeCode Available | 1 |
| CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning | Aug 10, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level | Dec 31, 2021 | Few-Shot LearningLanguage Modelling | CodeCode Available | 1 |
| IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning | Oct 25, 2021 | Arithmetic ReasoningMathematical Question Answering | CodeCode Available | 1 |
| A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis | Jul 15, 2021 | Mathematical ReasoningProgram Synthesis | CodeCode Available | 1 |
| GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning | May 30, 2021 | MathMathematical Reasoning | CodeCode Available | 1 |
| Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning | May 10, 2021 | Arithmetic ReasoningGeometry Problem Solving | CodeCode Available | 1 |
| LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning | Jan 15, 2021 | Inductive BiasMathematical Reasoning | CodeCode Available | 1 |
| IsarStep: a Benchmark for High-level Mathematical Reasoning | Jun 13, 2020 | Mathematical ProofsMathematical Reasoning | CodeCode Available | 1 |
| VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jul 17, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| A Survey of Deep Learning for Geometry Problem Solving | Jul 16, 2025 | Deep LearningGeometry Problem Solving | CodeCode Available | 0 |
| KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Jul 15, 2025 | GSM8KLanguage Modeling | —Unverified | 0 |
| Integrating External Tools with Large Language Models to Improve Accuracy | Jul 9, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Agentic-R1: Distilled Dual-Strategy Reasoning | Jul 8, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 |
| Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Jun 30, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training | Jun 27, 2025 | Knowledge DistillationMathematical Reasoning | —Unverified | 0 |
| Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 |
| AdapThink: Adaptive Thinking Preferences for Reasoning Language Model | Jun 23, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models | Jun 21, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 |
| Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving | Jun 20, 2025 | Automated Theorem ProvingDiversity | —Unverified | 0 |
| Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality | Jun 17, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot | Jun 17, 2025 | In-Context LearningMathematical Reasoning | —Unverified | 0 |
| Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles | Jun 16, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |
| A Technical Study into Small Reasoning Language Models | Jun 16, 2025 | Code GenerationComputational Efficiency | —Unverified | 0 |
| LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment | Jun 13, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Eliciting Reasoning in Language Models with Cognitive Tools | Jun 13, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study | Jun 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models | Jun 12, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving | Jun 12, 2025 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 |
| Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning | Jun 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning | Jun 12, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 0 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 |