| Enhancing Textbooks with Visuals from the Web for Improved Learning | Apr 18, 2023 | Math | CodeCode Available | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction | May 24, 2023 | Definition ExtractionMath | CodeCode Available | 0 |
| Translating a Math Word Problem to an Expression Tree | Nov 14, 2018 | MathMath Word Problem Solving | CodeCode Available | 0 |
| Practice Makes a Solver Perfect: Data Augmentation for Math Word Problem Solvers | Apr 30, 2022 | Data AugmentationDiversity | CodeCode Available | 0 |
| MIRB: Mathematical Information Retrieval Benchmark | May 21, 2025 | Automated Theorem ProvingInformation Retrieval | CodeCode Available | 0 |
| Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making | May 22, 2020 | BIG-bench Machine LearningDecision Making | CodeCode Available | 0 |
| Distinguishing affixoid formations from compounds | Aug 1, 2018 | ManagementMath | CodeCode Available | 0 |
| Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models | May 30, 2025 | MathMultiple-choice | CodeCode Available | 0 |
| Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving | Oct 15, 2019 | MathQuestion Answering | CodeCode Available | 0 |
| AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails | Feb 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| Algebra Error Classification with Large Language Models | May 8, 2023 | ClassificationMath | CodeCode Available | 0 |
| MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Nov 14, 2024 | General KnowledgeMath | CodeCode Available | 0 |
| Learning by Analogy: Diverse Questions Generation in Math Word Problem | Jun 15, 2023 | Math | CodeCode Available | 0 |
| Scaling up ridge regression for brain encoding in a massive individual fMRI dataset | Mar 28, 2024 | CPUMath | CodeCode Available | 0 |
| Compositional Processing Emerges in Neural Networks Solving Math Problems | May 19, 2021 | MathMathematical Reasoning | CodeCode Available | 0 |
| Learning Decentralized Swarms Using Rotation Equivariant Graph Neural Networks | Feb 24, 2025 | Graph Neural NetworkMath | CodeCode Available | 0 |
| Assessing hierarchies by their consistent segmentations | Apr 11, 2022 | MathSegmentation | CodeCode Available | 0 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 |