| Reasoning with Reinforced Functional Token Tuning | Feb 19, 2025 | Math | CodeCode Available | 1 | 5 |
| Efficient RL Training for Reasoning Models via Length-Aware Optimization | May 18, 2025 | Math | CodeCode Available | 1 | 5 |
| Recall and Learn: A Memory-augmented Solver for Math Word Problems | Sep 27, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 | 5 |
| Graph-to-Tree Learning for Solving Math Word Problems | Jul 1, 2020 | DecoderMath | CodeCode Available | 1 | 5 |
| Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Feb 17, 2025 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| Get an A in Math: Progressive Rectification Prompting | Dec 11, 2023 | Math | CodeCode Available | 1 | 5 |
| Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word Problem | Apr 7, 2020 | DecoderMachine Translation | CodeCode Available | 1 | 5 |
| GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving | Feb 15, 2024 | Geometry Problem SolvingMath | CodeCode Available | 1 | 5 |
| REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models | Jun 4, 2025 | Math | CodeCode Available | 1 | 5 |
| Training Step-Level Reasoning Verifiers with Formal Verification Tools | May 21, 2025 | Formal LogicMath | CodeCode Available | 1 | 5 |
| GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning | May 30, 2021 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges | May 21, 2025 | Mathvalid | CodeCode Available | 1 | 5 |
| CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning | Aug 10, 2022 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| RaDeR: Reasoning-aware Dense Retrieval Models | May 23, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems | Sep 24, 2020 | DiversityMath | CodeCode Available | 1 | 5 |
| QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Mar 28, 2025 | Logical ReasoningMath | CodeCode Available | 1 | 5 |
| FormulaNet: A Benchmark Dataset for Mathematical Formula Detection | Aug 29, 2022 | Math | CodeCode Available | 1 | 5 |
| Aioli: A Unified Optimization Framework for Language Model Data Mixing | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| From GAN to WGAN | Apr 18, 2019 | Generative Adversarial NetworkMath | CodeCode Available | 1 | 5 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 | 5 |
| On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents | Aug 2, 2024 | Code GenerationLarge Language Model | CodeCode Available | 1 | 5 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| From Zero to Hero: Convincing with Extremely Complicated Math | Apr 1, 2023 | Math | CodeCode Available | 1 | 5 |
| Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers | Jun 1, 2022 | Math | CodeCode Available | 1 | 5 |
| A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for Hand | Apr 25, 2020 | MathRelation | CodeCode Available | 1 | 5 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| NeMo-Inspector: A Visualization Tool for LLM Generation Analysis | May 1, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models | May 23, 2023 | Math | CodeCode Available | 1 | 5 |
| Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks | Jul 3, 2021 | DecoderMath | CodeCode Available | 1 | 5 |
| NLPBench: Evaluating Large Language Models on Solving NLP Problems | Sep 27, 2023 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| Entropy-Regularized Process Reward Model | Dec 15, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| ArMATH: a Dataset for Solving Arabic Math Word Problems | Jun 1, 2022 | Deep LearningMath | CodeCode Available | 1 | 5 |
| Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Oct 28, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 | 5 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation | Apr 15, 2025 | MathQuantum Machine Learning | CodeCode Available | 1 | 5 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| Pretrained Language Models are Symbolic Mathematics Solvers too! | Oct 7, 2021 | IngenuityLanguage Modelling | CodeCode Available | 1 | 5 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 | 5 |
| Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations | Nov 12, 2024 | MathRetrieval | CodeCode Available | 1 | 5 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 | 5 |
| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 | 5 |
| Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts | Oct 23, 2023 | Logical ReasoningMath | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| Are NLP Models really able to Solve Simple Math Word Problems? | Mar 12, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| Case-Based or Rule-Based: How Do Transformers Do the Math? | Feb 27, 2024 | MathSystematic Generalization | CodeCode Available | 1 | 5 |