| Efficient Non-Parametric Optimizer Search for Diverse Tasks | Sep 27, 2022 | AutoMLMath | CodeCode Available | 0 | 5 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 | 5 |
| Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational Data | Aug 7, 2023 | MathMisconceptions | CodeCode Available | 0 | 5 |
| Smart Vision-Language Reasoners | Jul 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Effects of structure on reasoning in instance-level Self-Discover | Jul 4, 2025 | Math | CodeCode Available | 0 | 5 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 | 5 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data | Nov 23, 2022 | MathReranking | CodeCode Available | 0 | 5 |
| Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning | Sep 17, 2024 | Few-Shot LearningIn-Context Learning | CodeCode Available | 0 | 5 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 | 5 |
| An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP) | Feb 23, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Reasoning in Large Language Models Through Symbolic Math Word Problems | Aug 3, 2023 | Math | CodeCode Available | 0 | 5 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 | 5 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 | 5 |
| An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function | Mar 31, 2025 | Data CompressionMath | CodeCode Available | 0 | 5 |
| Practice Makes a Solver Perfect: Data Augmentation for Math Word Problem Solvers | Apr 30, 2022 | Data AugmentationDiversity | CodeCode Available | 0 | 5 |
| Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Nov 27, 2024 | In-Context LearningMath | CodeCode Available | 0 | 5 |
| Adversarial Examples for Evaluating Math Word Problem Solvers | Sep 13, 2021 | Adversarial RobustnessMath | CodeCode Available | 0 | 5 |
| An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Oct 2, 2024 | 8kMath | CodeCode Available | 0 | 5 |
| Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing | Jul 15, 2025 | Knowledge TracingMath | CodeCode Available | 0 | 5 |
| Prover-Verifier Games improve legibility of LLM outputs | Jul 18, 2024 | Math | CodeCode Available | 0 | 5 |
| Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | Jun 3, 2023 | MathMath Word Problem Solving | CodeCode Available | 0 | 5 |
| Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning | Sep 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Jun 27, 2024 | Distractor GenerationMath | CodeCode Available | 0 | 5 |
| An algorithm to represent inbreeding trees | Sep 21, 2020 | Math | CodeCode Available | 0 | 5 |
| DIVE: Diversified Iterative Self-Improvement | Jan 1, 2025 | DiversityGSM8K | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 | 5 |
| OntoMath^PRO Ontology: A Linked Data Hub for Mathematics | Jul 17, 2014 | Math | CodeCode Available | 0 | 5 |
| Distinguishing affixoid formations from compounds | Aug 1, 2018 | ManagementMath | CodeCode Available | 0 | 5 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 | 5 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 | 5 |
| An Edge-Enhanced Hierarchical Graph-to-Tree Network for Math Word Problem Solving | Nov 1, 2021 | DecoderMath | CodeCode Available | 0 | 5 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 | 5 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| Neural Machine Translation and Sequence-to-sequence Models: A Tutorial | Mar 5, 2017 | Machine TranslationMath | CodeCode Available | 0 | 5 |
| Deterministic and Nondeterministic Particle Motion with Interaction Mechanisms | Dec 31, 2022 | Math | CodeCode Available | 0 | 5 |
| Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition | Jan 5, 2018 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 0 | 5 |
| More is More: Addition Bias in Large Language Models | Sep 4, 2024 | MathText Summarization | CodeCode Available | 0 | 5 |
| AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels | May 25, 2020 | ArticlesClassification | CodeCode Available | 0 | 5 |
| Modeling Intra-Relation in Math Word Problems with Different Functional Multi-Head Attentions | Jul 1, 2019 | Deep LearningMath | CodeCode Available | 0 | 5 |
| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 | 5 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Automatic Short Math Answer Grading via In-context Meta-learning | May 30, 2022 | automatic short answer gradingIn-Context Learning | CodeCode Available | 0 | 5 |
| MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Nov 14, 2024 | General KnowledgeMath | CodeCode Available | 0 | 5 |
| A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving | Jan 7, 2025 | DiversityKnowledge Distillation | CodeCode Available | 0 | 5 |
| Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making | May 22, 2020 | BIG-bench Machine LearningDecision Making | CodeCode Available | 0 | 5 |
| Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models | May 30, 2025 | MathMultiple-choice | CodeCode Available | 0 | 5 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |