| Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval | Mar 21, 2022 | Information RetrievalMath | CodeCode Available | 0 |
| MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training | Feb 28, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models | May 26, 2025 | Contrastive LearningMath | CodeCode Available | 0 |
| Effects of structure on reasoning in instance-level Self-Discover | Jul 4, 2025 | Math | CodeCode Available | 0 |
| Mapping to Declarative Knowledge for Word Problem Solving | Dec 26, 2017 | MathTranslation | CodeCode Available | 0 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior | Jul 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 |
| Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing | Oct 2, 2024 | Contrastive LearningKnowledge Tracing | CodeCode Available | 0 |
| Efficient Non-Parametric Optimizer Search for Diverse Tasks | Sep 27, 2022 | AutoMLMath | CodeCode Available | 0 |
| Heteroclinic cycling and extinction in May-Leonard models with demographic stochasticity | Nov 10, 2021 | MathUnity | CodeCode Available | 0 |
| Deterministic and Nondeterministic Particle Motion with Interaction Mechanisms | Dec 31, 2022 | Math | CodeCode Available | 0 |
| ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Jan 14, 2025 | GSM8KMath | CodeCode Available | 0 |
| LM^2: A Simple Society of Language Models Solves Complex Reasoning | Apr 2, 2024 | MathMedQA | CodeCode Available | 0 |
| AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Textual Enhanced Contrastive Learning for Solving Math Word Problems | Nov 29, 2022 | Contrastive LearningMath | CodeCode Available | 0 |
| ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization | Jun 12, 2025 | Math | CodeCode Available | 0 |
| How Do Humans Write Code? Large Models Do It the Same Way Too | Feb 24, 2024 | Code GenerationMath | CodeCode Available | 0 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 |
| How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark | May 24, 2025 | Math | CodeCode Available | 0 |
| How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study | May 21, 2025 | Math | CodeCode Available | 0 |
| World Models for Math Story Problems | Jun 7, 2023 | Math | CodeCode Available | 0 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 |
| ChatBench: From Static Benchmarks to Human-AI Evaluation | Mar 22, 2025 | MathMMLU | CodeCode Available | 0 |