| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | May 9, 2024 | HallucinationMath | —Unverified | 0 |
| VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context | May 8, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| AlphaMath Almost Zero: Process Supervision without Process | May 6, 2024 | Mathematical ReasoningMath Word Problem Solving | CodeCode Available | 3 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 |
| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions | Apr 29, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Benchmarking Benchmark Leakage in Large Language Models | Apr 29, 2024 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training | Apr 22, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| PARAMANU-GANITA: Language Model with Mathematical Capabilities | Apr 22, 2024 | Domain AdaptationGSM8K | —Unverified | 0 |
| Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models | Apr 22, 2024 | DecoderMathematical Reasoning | CodeCode Available | 0 |
| iTBLS: A Dataset of Interactive Conversations Over Tabular Information | Apr 19, 2024 | ArticlesMathematical Reasoning | —Unverified | 0 |
| Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Apr 18, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory | Apr 18, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Apr 17, 2024 | FormLanguage Model Evaluation | CodeCode Available | 0 |
| Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards | Apr 16, 2024 | GSM8KMath | CodeCode Available | 2 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition | Apr 10, 2024 | Code GenerationMathematical Reasoning | CodeCode Available | 0 |
| Evaluating Mathematical Reasoning Beyond Accuracy | Apr 8, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models | Apr 5, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Planning and Editing What You Retrieve for Enhanced Tool Learning | Mar 30, 2024 | Mathematical ReasoningRetrieval | CodeCode Available | 0 |
| Dual Instruction Tuning with Large Language Models for Mathematical Reasoning | Mar 27, 2024 | Domain GeneralizationMathematical Reasoning | —Unverified | 0 |