| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 | 5 |
| Efficient RL Training for Reasoning Models via Length-Aware Optimization | May 18, 2025 | Math | CodeCode Available | 1 | 5 |
| Non-Autoregressive Math Word Problem Solver with Unified Tree Structure | May 8, 2023 | Mathvalid | CodeCode Available | 1 | 5 |
| Eliciting Latent Knowledge from Quirky Language Models | Dec 2, 2023 | Anomaly DetectionMath | CodeCode Available | 1 | 5 |
| ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models | May 23, 2023 | Math | CodeCode Available | 1 | 5 |
| ArMATH: a Dataset for Solving Arabic Math Word Problems | Jun 1, 2022 | Deep LearningMath | CodeCode Available | 1 | 5 |
| MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations | Feb 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents | Jul 24, 2024 | Math | CodeCode Available | 1 | 5 |
| Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Oct 28, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 | 5 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 | 5 |