| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| Reasoning with Reinforced Functional Token Tuning | Feb 19, 2025 | Math | CodeCode Available | 1 | 5 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| FormulaNet: A Benchmark Dataset for Mathematical Formula Detection | Aug 29, 2022 | Math | CodeCode Available | 1 | 5 |
| Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models | Jun 4, 2025 | Math | CodeCode Available | 1 | 5 |
| RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning | May 23, 2023 | In-Context LearningLanguage Modelling | CodeCode Available | 1 | 5 |
| RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold | Jun 20, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | Aug 1, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | Aug 21, 2024 | 8kGSM8K | CodeCode Available | 1 | 5 |
| Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | Apr 10, 2025 | MathMMLU | CodeCode Available | 1 | 5 |
| U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs | Dec 4, 2024 | DiversityMath | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| A quantitative study of NLP approaches to question difficulty estimation | May 17, 2023 | MathMultiple-choice | CodeCode Available | 0 | 5 |
| Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval | Mar 21, 2022 | Information RetrievalMath | CodeCode Available | 0 | 5 |
| Can LLMs Reason in the Wild with Programs? | Jun 19, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| A Probabilistic Model for Node Classification in Directed Graphs | Jan 3, 2025 | MathNode Classification | CodeCode Available | 0 | 5 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 | 5 |
| Evaluating and Optimizing Educational Content with Large Language Model Judgments | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? | May 10, 2024 | Mathtext similarity | CodeCode Available | 0 | 5 |
| ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization | Jun 12, 2025 | Math | CodeCode Available | 0 | 5 |
| A Goal-Driven Tree-Structured Neural Model for Math Word Problems | Aug 10, 2019 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |
| Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning | Sep 17, 2024 | Few-Shot LearningIn-Context Learning | CodeCode Available | 0 | 5 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 | 5 |
| EPT-X: An Expression-Pointer Transformer model that generates eXplanations for numbers | May 1, 2022 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Reasoning in Large Language Models Through Symbolic Math Word Problems | Aug 3, 2023 | Math | CodeCode Available | 0 | 5 |
| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 | 5 |
| AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic Processing | Nov 13, 2024 | DecoderMath | CodeCode Available | 0 | 5 |
| Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving | Oct 15, 2019 | MathQuestion Answering | CodeCode Available | 0 | 5 |
| Enhancing Textbooks with Visuals from the Web for Improved Learning | Apr 18, 2023 | Math | CodeCode Available | 0 | 5 |
| Practice Makes a Solver Perfect: Data Augmentation for Math Word Problem Solvers | Apr 30, 2022 | Data AugmentationDiversity | CodeCode Available | 0 | 5 |
| AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | Jun 6, 2025 | Large Language ModelMath | CodeCode Available | 0 | 5 |
| Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing | Jul 15, 2025 | Knowledge TracingMath | CodeCode Available | 0 | 5 |
| OntoMath^PRO Ontology: A Linked Data Hub for Mathematics | Jul 17, 2014 | Math | CodeCode Available | 0 | 5 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| Bounds on Multi-asset Derivatives via Neural Networks | Nov 13, 2019 | Math | CodeCode Available | 0 | 5 |
| Efficient Non-Parametric Optimizer Search for Diverse Tasks | Sep 27, 2022 | AutoMLMath | CodeCode Available | 0 | 5 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Prover-Verifier Games improve legibility of LLM outputs | Jul 18, 2024 | Math | CodeCode Available | 0 | 5 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 | 5 |
| Effects of structure on reasoning in instance-level Self-Discover | Jul 4, 2025 | Math | CodeCode Available | 0 | 5 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 | 5 |
| Neural Machine Translation and Sequence-to-sequence Models: A Tutorial | Mar 5, 2017 | Machine TranslationMath | CodeCode Available | 0 | 5 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data | Nov 23, 2022 | MathReranking | CodeCode Available | 0 | 5 |