| Mind Scramble: Unveiling Large Language Model Psychology Via Typoglycemia | Oct 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MIRB: Mathematical Information Retrieval Benchmark | May 21, 2025 | Automated Theorem ProvingInformation Retrieval | CodeCode Available | 0 | 5 |
| Meta-Reasoning Improves Tool Use in Large Language Models | Nov 7, 2024 | Math | CodeCode Available | 0 | 5 |
| How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study | May 21, 2025 | Math | CodeCode Available | 0 | 5 |
| How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark | May 24, 2025 | Math | CodeCode Available | 0 | 5 |
| metboost: Exploratory regression analysis with hierarchically clustered data | Feb 13, 2017 | MathMissing Values | CodeCode Available | 0 | 5 |
| How Do Humans Write Code? Large Models Do It the Same Way Too | Feb 24, 2024 | Code GenerationMath | CodeCode Available | 0 | 5 |
| ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models | May 22, 2025 | Large Language ModelMath | CodeCode Available | 0 | 5 |
| Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making | May 22, 2020 | BIG-bench Machine LearningDecision Making | CodeCode Available | 0 | 5 |
| mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models | Jun 4, 2024 | Math | CodeCode Available | 0 | 5 |
| MAWPS: A Math Word Problem Repository | Jun 1, 2016 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |
| Heteroclinic cycling and extinction in May-Leonard models with demographic stochasticity | Nov 10, 2021 | MathUnity | CodeCode Available | 0 | 5 |
| ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision | Oct 13, 2022 | Math | CodeCode Available | 0 | 5 |
| Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | Jun 24, 2023 | DecoderIngenuity | CodeCode Available | 0 | 5 |
| Algebra Error Classification with Large Language Models | May 8, 2023 | ClassificationMath | CodeCode Available | 0 | 5 |
| Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior | Jul 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark | May 28, 2025 | Math | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |
| Computationally Identifying Funneling and Focusing Questions in Classroom Discourse | Jul 8, 2022 | Math | CodeCode Available | 0 | 5 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models | May 26, 2025 | Contrastive LearningMath | CodeCode Available | 0 | 5 |
| Compositional Processing Emerges in Neural Networks Solving Math Problems | May 19, 2021 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| MathScale: Scaling Instruction Tuning for Mathematical Reasoning | Mar 5, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction | May 24, 2023 | Definition ExtractionMath | CodeCode Available | 0 | 5 |