| MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Nov 14, 2024 | General KnowledgeMath | CodeCode Available | 0 | 5 |
| Modeling Intra-Relation in Math Word Problems with Different Functional Multi-Head Attentions | Jul 1, 2019 | Deep LearningMath | CodeCode Available | 0 | 5 |
| Scaling up ridge regression for brain encoding in a massive individual fMRI dataset | Mar 28, 2024 | CPUMath | CodeCode Available | 0 | 5 |
| Mind Scramble: Unveiling Large Language Model Psychology Via Typoglycemia | Oct 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| metboost: Exploratory regression analysis with hierarchically clustered data | Feb 13, 2017 | MathMissing Values | CodeCode Available | 0 | 5 |
| Heteroclinic cycling and extinction in May-Leonard models with demographic stochasticity | Nov 10, 2021 | MathUnity | CodeCode Available | 0 | 5 |
| ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision | Oct 13, 2022 | Math | CodeCode Available | 0 | 5 |
| Algebra Error Classification with Large Language Models | May 8, 2023 | ClassificationMath | CodeCode Available | 0 | 5 |
| Meta-Reasoning Improves Tool Use in Large Language Models | Nov 7, 2024 | Math | CodeCode Available | 0 | 5 |
| Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior | Jul 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark | May 28, 2025 | Math | CodeCode Available | 0 | 5 |
| Computationally Identifying Funneling and Focusing Questions in Classroom Discourse | Jul 8, 2022 | Math | CodeCode Available | 0 | 5 |
| Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models | May 26, 2025 | Contrastive LearningMath | CodeCode Available | 0 | 5 |
| Compositional Processing Emerges in Neural Networks Solving Math Problems | May 19, 2021 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| MIRB: Mathematical Information Retrieval Benchmark | May 21, 2025 | Automated Theorem ProvingInformation Retrieval | CodeCode Available | 0 | 5 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction | May 24, 2023 | Definition ExtractionMath | CodeCode Available | 0 | 5 |
| Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? | Oct 27, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| MAWPS: A Math Word Problem Repository | Jun 1, 2016 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |
| In-Context Principle Learning from Mistakes | Feb 8, 2024 | GSM8KIn-Context Learning | CodeCode Available | 0 | 5 |
| mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models | Jun 4, 2024 | Math | CodeCode Available | 0 | 5 |
| Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | Jun 24, 2023 | DecoderIngenuity | CodeCode Available | 0 | 5 |
| Guided Speculative Inference for Efficient Test-Time Alignment of LLMs | Jun 4, 2025 | Math | CodeCode Available | 0 | 5 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 | 5 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support | Dec 16, 2024 | Large Language ModelMath | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |
| Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation | Dec 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making | May 22, 2020 | BIG-bench Machine LearningDecision Making | CodeCode Available | 0 | 5 |
| Exploring the Reliability of Large Language Models as Customized Evaluators for Diverse NLP Tasks | Oct 30, 2023 | FairnessMath | CodeCode Available | 0 | 5 |
| CoinMath: Harnessing the Power of Coding Instruction for Math LLMs | Dec 16, 2024 | DescriptiveMath | CodeCode Available | 0 | 5 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 | 5 |
| Mathematics Content Understanding for Cyberlearning via Formula Evolution Map | Dec 31, 2018 | Graph MiningMath | CodeCode Available | 0 | 5 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 | 5 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 | 5 |
| PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL | Sep 21, 2024 | MathText to SQL | CodeCode Available | 0 | 5 |
| GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation | Jun 17, 2024 | Image GenerationMath | CodeCode Available | 0 | 5 |
| Coarse-grained Stochastic Model of Myosin-Driven Vesicles into Dendritic Spines | Jul 15, 2021 | Math | CodeCode Available | 0 | 5 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training | Feb 28, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Mapping to Declarative Knowledge for Word Problem Solving | Dec 26, 2017 | MathTranslation | CodeCode Available | 0 | 5 |
| Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Mar 23, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| A Context-Enhanced Framework for Sequential Graph Reasoning | Dec 12, 2024 | Math | CodeCode Available | 0 | 5 |
| Generalizing Math Word Problem Solvers via Solution Diversification | Dec 1, 2022 | Math | CodeCode Available | 0 | 5 |
| MathScale: Scaling Instruction Tuning for Mathematical Reasoning | Mar 5, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| Adversarial Math Word Problem Generation | Feb 27, 2024 | Math | CodeCode Available | 0 | 5 |
| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |