| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 | 5 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data | Nov 23, 2022 | MathReranking | CodeCode Available | 0 | 5 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 | 5 |
| An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP) | Feb 23, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 | 5 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 | 5 |
| An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function | Mar 31, 2025 | Data CompressionMath | CodeCode Available | 0 | 5 |
| OntoMath^PRO Ontology: A Linked Data Hub for Mathematics | Jul 17, 2014 | Math | CodeCode Available | 0 | 5 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 | 5 |
| Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Nov 27, 2024 | In-Context LearningMath | CodeCode Available | 0 | 5 |
| Adversarial Examples for Evaluating Math Word Problem Solvers | Sep 13, 2021 | Adversarial RobustnessMath | CodeCode Available | 0 | 5 |
| An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Oct 2, 2024 | 8kMath | CodeCode Available | 0 | 5 |
| Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | Jun 3, 2023 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |
| Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning | Sep 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Jun 27, 2024 | Distractor GenerationMath | CodeCode Available | 0 | 5 |
| An algorithm to represent inbreeding trees | Sep 21, 2020 | Math | CodeCode Available | 0 | 5 |
| DIVE: Diversified Iterative Self-Improvement | Jan 1, 2025 | DiversityGSM8K | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 | 5 |
| Distinguishing affixoid formations from compounds | Aug 1, 2018 | ManagementMath | CodeCode Available | 0 | 5 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 | 5 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 | 5 |
| An Edge-Enhanced Hierarchical Graph-to-Tree Network for Math Word Problem Solving | Nov 1, 2021 | DecoderMath | CodeCode Available | 0 | 5 |