| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Adversarial Math Word Problem Generation | Feb 27, 2024 | Math | CodeCode Available | 0 |
| Generalizing Math Word Problem Solvers via Solution Diversification | Dec 1, 2022 | Math | CodeCode Available | 0 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Oct 2, 2024 | 8kMath | CodeCode Available | 0 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 |
| X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs | May 22, 2025 | ChatbotMath | CodeCode Available | 0 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 |
| An Edge-Enhanced Hierarchical Graph-to-Tree Network for Math Word Problem Solving | Nov 1, 2021 | DecoderMath | CodeCode Available | 0 |
| Towards Effective and Efficient Continual Pre-training of Large Language Models | Jul 26, 2024 | Math | CodeCode Available | 0 |
| Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction | Jan 9, 2025 | MathSentence | CodeCode Available | 0 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models | Nov 7, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Wide & Deep Learning for Judging Student Performance in Online One-on-one Math Classes | Jul 13, 2022 | Math | CodeCode Available | 0 |
| Automatic Generation of Headlines for Online Math Questions | Nov 27, 2019 | Math | CodeCode Available | 0 |
| We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields | Oct 23, 2023 | DiversityMath | CodeCode Available | 0 |
| GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation | Jun 17, 2024 | Image GenerationMath | CodeCode Available | 0 |
| LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning | May 17, 2022 | MathMath Word Problem Solving | CodeCode Available | 0 |
| Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems | Nov 2, 2018 | DecoderMath | CodeCode Available | 0 |
| The paradox of the compositionality of natural language: a neural machine translation case study | Aug 12, 2021 | Machine TranslationMath | CodeCode Available | 0 |
| Neural Machine Translation and Sequence-to-sequence Models: A Tutorial | Mar 5, 2017 | Machine TranslationMath | CodeCode Available | 0 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 |
| TEIMMA: The First Content Reuse Annotator for Text, Images, and Math | May 22, 2023 | Math | CodeCode Available | 0 |
| Structure-Unified M-Tree Coding Solver for MathWord Problem | Oct 22, 2022 | Math | CodeCode Available | 0 |
| Bounds on Multi-asset Derivatives via Neural Networks | Nov 13, 2019 | Math | CodeCode Available | 0 |
| HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization | May 16, 2025 | Math | CodeCode Available | 0 |
| Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models | Jul 9, 2024 | Math | CodeCode Available | 0 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQA | Jun 29, 2022 | Math | CodeCode Available | 0 |
| TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students | May 2, 2025 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network | May 1, 2022 | Math | CodeCode Available | 0 |
| Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Mar 23, 2025 | GSM8KMath | CodeCode Available | 0 |
| Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning | Sep 17, 2024 | Few-Shot LearningIn-Context Learning | CodeCode Available | 0 |
| Reasoning in Large Language Models Through Symbolic Math Word Problems | Aug 3, 2023 | Math | CodeCode Available | 0 |
| The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | Feb 21, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices | Oct 4, 2023 | ArticlesMath | CodeCode Available | 0 |
| Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation | Dec 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| SemEval-2019 Task 10: Math Question Answering | Jun 1, 2019 | MathQuestion Answering | CodeCode Available | 0 |
| Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | Jun 3, 2023 | MathMath Word Problem Solving | CodeCode Available | 0 |
| Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving | Jun 2, 2021 | Math | CodeCode Available | 0 |
| Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Oct 10, 2024 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 |
| Guided Speculative Inference for Efficient Test-Time Alignment of LLMs | Jun 4, 2025 | Math | CodeCode Available | 0 |
| Can Vision-Language Models Evaluate Handwritten Math? | Jan 13, 2025 | Math | CodeCode Available | 0 |
| Adversarial Examples for Evaluating Math Word Problem Solvers | Sep 13, 2021 | Adversarial RobustnessMath | CodeCode Available | 0 |
| Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? | Oct 27, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |