| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 | 5 |
| Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling | Nov 12, 2022 | Entity LinkingKnowledge Graphs | CodeCode Available | 1 | 5 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 | 5 |
| MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations | Feb 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Math Word Problem Solving with Explicit Numerical Values | Aug 1, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents | Jul 24, 2024 | Math | CodeCode Available | 1 | 5 |
| MathPrompter: Mathematical Reasoning using Large Language Models | Mar 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 1 | 5 |