| Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations | Nov 12, 2024 | MathRetrieval | CodeCode Available | 1 | 5 |
| Pretrained Language Models are Symbolic Mathematics Solvers too! | Oct 7, 2021 | IngenuityLanguage Modelling | CodeCode Available | 1 | 5 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation | Apr 25, 2024 | Code GenerationMath | CodeCode Available | 1 | 5 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| RaDeR: Reasoning-aware Dense Retrieval Models | May 23, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation | Apr 15, 2025 | MathQuantum Machine Learning | CodeCode Available | 1 | 5 |
| Reasoning with Reinforced Functional Token Tuning | Feb 19, 2025 | Math | CodeCode Available | 1 | 5 |
| Prover-Verifier Games improve legibility of LLM outputs | Jul 18, 2024 | Math | CodeCode Available | 0 | 5 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| A quantitative study of NLP approaches to question difficulty estimation | May 17, 2023 | MathMultiple-choice | CodeCode Available | 0 | 5 |
| Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval | Mar 21, 2022 | Information RetrievalMath | CodeCode Available | 0 | 5 |
| Can LLMs Reason in the Wild with Programs? | Jun 19, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| A Probabilistic Model for Node Classification in Directed Graphs | Jan 3, 2025 | MathNode Classification | CodeCode Available | 0 | 5 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 | 5 |
| Practice Makes a Solver Perfect: Data Augmentation for Math Word Problem Solvers | Apr 30, 2022 | Data AugmentationDiversity | CodeCode Available | 0 | 5 |
| Evaluating and Optimizing Educational Content with Large Language Model Judgments | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? | May 10, 2024 | Mathtext similarity | CodeCode Available | 0 | 5 |
| A Goal-Driven Tree-Structured Neural Model for Math Word Problems | Aug 10, 2019 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |