| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| MIRB: Mathematical Information Retrieval Benchmark | May 21, 2025 | Automated Theorem ProvingInformation Retrieval | CodeCode Available | 0 | 5 |
| Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction | May 24, 2023 | Definition ExtractionMath | CodeCode Available | 0 | 5 |
| Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models | May 30, 2025 | MathMultiple-choice | CodeCode Available | 0 | 5 |
| Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? | Oct 27, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| Mathematics Content Understanding for Cyberlearning via Formula Evolution Map | Dec 31, 2018 | Graph MiningMath | CodeCode Available | 0 | 5 |
| Guided Speculative Inference for Efficient Test-Time Alignment of LLMs | Jun 4, 2025 | Math | CodeCode Available | 0 | 5 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 | 5 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support | Dec 16, 2024 | Large Language ModelMath | CodeCode Available | 0 | 5 |