| SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training | Oct 3, 2023 | Contrastive LearningEquation Discovery | CodeCode Available | 1 | 5 |
| The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts | Nov 21, 2022 | Elementary MathematicsMath | CodeCode Available | 1 | 5 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| A quantitative study of NLP approaches to question difficulty estimation | May 17, 2023 | MathMultiple-choice | CodeCode Available | 0 | 5 |
| Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval | Mar 21, 2022 | Information RetrievalMath | CodeCode Available | 0 | 5 |
| Can LLMs Reason in the Wild with Programs? | Jun 19, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| A Probabilistic Model for Node Classification in Directed Graphs | Jan 3, 2025 | MathNode Classification | CodeCode Available | 0 | 5 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 | 5 |
| Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing | Jul 15, 2025 | Knowledge TracingMath | CodeCode Available | 0 | 5 |